13. December 2007 04:03
I know SQL 2005 has been out for about 2 years now, and SSIS was a big overhaul that was done with it. I never had a need to really use SSIS since it has been available, then again moving to SQL 2005 right away also doesn't always work for companies. However, I just got done with a project that required the following (from a 30,000 foot level):
- Pulling data from 2 flat files and 2 SQL Servers
- Manipulating the data from all sources and ending up with 1 pipe delimited flat file
- Archiving the staging tables I used in the SSIS package
- Copying and moving files into a central directory and archiving those on a daily basis.
Although I have done similiar things with flat files in a C# executable I decided to try and utilize SSIS this time because I wanted to see how it was with dealing with flat files, and since I had to pull and manipulate the data, a C# executable was just not a valid option.
The positives I noticed from SSIS for my situation was actually reading the data in from the flat files. They were fixed width files and I loved the fact I could configure the Flat File connector the one time and then I could treat the data as if it were a DataSet in C#. I basically ran a Conditional Split on the flat file and did a Merge Join on the split results and outputed it to a SQL staging table. This was very convenient. In addition the File Copy task is a great tool as well. This allows me to keep all the functionality I need into one SSIS package, and lets me avoid having a SQL Job and then running a batch job for the file system tasks I needed to complete.
However, the downfall I did find was that when it seemed I needed to do something somewhat complex such as utilizing temp tables and manipulating data columns I needed to write my own SQL script. I didn't have a problem with this, I was just hoping that SSIS could have been a bit more friendly or capabale of using these concepts. Sure you can use the Derived Column, Sort and Multicast trasnformations, but instead of dealing with that, it was just easier to write the SQL code myself.
Another thing I found to be a positive was the ability to write a VB.NET script using the Script Task in the control flow. I didn't like the functionality of the File System task to copy and renaming files. you have to have too many File System Tasks on there if you are copying multiple files or whatever it is you need to do with them. The VB.NET script basically copied the file from one location to another, copied them again to an archive directory, and then renamed them in the fileName_MMDDYYYY.txt format. However, I don't think I would use the Script Task for anything robust, but at least its available to use and write simple VB.NET code.
Overall I'm know SSIS is capable of doing more than what I saw, however I always seem to find it easier to just write the code myself in a SQL script and go from there.
I just wanted to give my 2 cents on SSIS right now since I actually had a need to use it for something worth using it for.
7. December 2007 23:59
I was recently working on an SSIS package in SQL 2005 and I ran into a problem of wanting to remove duplicate rows based on 2 keys. In SSIS you can definitely accomplish this by using the Sort transformation, but you don't get a choice of where those duplicate rows go. Below is the code that I used to help remove the duplicate rows in a SQL table.
SET ROWCOUNT 1
DELETE FROM #tmpTable
WHERE PrimaryKey = @variable
AND OtherField = @otherVariable
SET ROWCOUNT 0
It actually ends up being pretty simple as long as you have the keys you know you need to delete.
Now your DELETE statement could be a little more complicated depending on your requirements for the duplicate rows. For my example I needed to delete the row if it had duplicate values of the primary key and other field.
Basically what makes it all possible is the ROWCOUNT keyword. This allows us to tell the engine to stop processing the command once it has reached the row result that ROWCOUNT is set to. In this example, ROWCOUNT is set to 1, so once 1 row has been deleted it will stop the deletion and move on. We need to set the ROWCOUNT back to 0 in case this is in a WHILE loop or whatever other code you may use.
Note: I did an ORDER BY on the #tmpTable by the primary key and other field so I knew I would delete the correct record.
The reason I liked this so much was because I could do what I wanted with the duplicate row before I deleted it. In my case I moved the duplicate row off to a secondary temp table. This was something I could not find a way to accomplish in SSIS.