I was recently working on an SSIS package in SQL 2005 and I ran into a problem of wanting to remove duplicate rows based on 2 keys. In SSIS you can definitely accomplish this by using the Sort transformation, but you don't get a choice of where those duplicate rows go. Below is the code that I used to help remove the duplicate rows in a SQL table.
SET ROWCOUNT 1
DELETE FROM #tmpTable
WHERE PrimaryKey = @variable
AND OtherField = @otherVariable
SET ROWCOUNT 0
It actually ends up being pretty simple as long as you have the keys you know you need to delete.
Now your DELETE statement could be a little more complicated depending on your requirements for the duplicate rows. For my example I needed to delete the row if it had duplicate values of the primary key and other field.
Basically what makes it all possible is the ROWCOUNT keyword. This allows us to tell the engine to stop processing the command once it has reached the row result that ROWCOUNT is set to. In this example, ROWCOUNT is set to 1, so once 1 row has been deleted it will stop the deletion and move on. We need to set the ROWCOUNT back to 0 in case this is in a WHILE loop or whatever other code you may use.
Note: I did an ORDER BY on the #tmpTable by the primary key and other field so I knew I would delete the correct record.
The reason I liked this so much was because I could do what I wanted with the duplicate row before I deleted it. In my case I moved the duplicate row off to a secondary temp table. This was something I could not find a way to accomplish in SSIS.
Be the first to rate this post
- Currently 0/5 Stars.
- 1
- 2
- 3
- 4
- 5