[quoted text, click to view] On 11 Nov 2004 20:48:16 -0800, Karen Middleton wrote:
>Hugo & Roji
>
>Many thanks for your tips.
>
>Hugo,
>
>The solution you suggested is perfect but is it possible I can set
>some kind of a commit interval for the update and Insert since my
>staging table is 2 mill+ I do not want the temp db to bloat with that
>size is it possible I can handle this in smaller chunks to optimally
>use the tempdb space.
Hi Karen,
The usual pattern is something like this:
SET ROWCOUNT 10000 -- Adapt this to your needs
WHILE 1 = 1
BEGIN
UPDATE (or DELETE or INSERT)
SET ... = ...
WHERE ..... -- this must somehow exclude the rows
AND ..... -- that were handled in previous iterations
IF @@ROWCOUNT = 0 BREAK
END
SET ROWCOUNT 0
In your situation, you'd have to have two of these loops. The first for
updating existing rows, the second for inserting new rows. The second loop
is easy - if you insert 10,000 rows that don't already exist, the next
iteration will automatically skip these (as they now do exist) and go on
to the next 10,000. The update loop is harder - the only way to prevent
each iteration from simply updating the same set of rows is to compare not
only the primary key columns, but ALL columns - if all are equal, that row
should be excluded. That will result in a bloated query (depending on the
number of columns), with special handling for columns that may be NULL. It
will also be slow, as comparing all columns requires a table scan for the
correlated subquery (and if your staging table is 2 mill+ rows, your dest
table will probably be even bigger). Not nice.
Instead, I'd recommend you to use the following pattern:
CREATE TABLE #TempTable (-- same structure as staging table (Source),
-- constraints not required,
-- but add indexes to optimize for speed)
SET ROWCOUNT 10000 -- Adapt this to your needs
WHILE 1 = 1
BEGIN
BEGIN TRANSACTION
INSERT INTO #TempTable (..., ...)
SELECT ..., ...
FROM Source
ORDER BY ..., ... -- Must be a combination that has no duplicates
IF @@ROWCOUNT = 0 BREAK
DELETE FROM Source
ORDER BY ..., ... -- Same column combination as before
-- The order by ensures that the same (max 10000) rows get deleted that
-- were just copied into the #TempTable.
-- An alternative to using order by is to INSERT INTO 10,000 rows "at
-- random" (without ORDER BY) and use EXISTS to delete the same rows
-- from Source.
UPDATE Dest
SET Sales = #TempTable.Sales
FROM Dest
INNER JOIN #TempTable
ON #TempTable.Material = Dest.Material
AND #TempTable.Customer = Dest.Customer
AND #TempTable.Year = Dest.Year
IF @@ERROR <> 0
BEGIN
ROLLBACK TRANSACTION
-- Other error handling (generate message, log error, etc.)
GOTO Done
END
INSERT Dest (Material, Customer, Year, Sales)
SELECT Material, Customer, Year, Sales
FROM #TempTable
WHERE NOT EXISTS
(SELECT *
FROM Dest
WHERE #TempTable.Material = Dest.Material
AND #TempTable.Customer = Dest.Customer
AND #TempTable.Year = Dest.Year)
IF @@ERROR <> 0
BEGIN
ROLLBACK TRANSACTION
-- Other error handling (generate message, log error, etc.)
GOTO Done
END
COMMIT TRANSACTION
END
-- After WHILE is exited through BREAK, a transaction is still open!
COMMIT TRANSACTION
Done:
PRINT 'Done'
[quoted text, click to view] >Also, is it possible I can make it generic so that I can pass the
>source and target table names and can the stored procedure discover
>all the key and non-key columns and do the upsert.
It is possible using dynamic SQL, but I recommend strongly against it. It
does save you some copy, change, paste work when making your application,
but it will probably cost you lots more work in the long run. If I have to
do something like this, I'd prefer to have seperate "upsert" procedures
for each table that needs this functionality. Once you got one working
procedure, it's easy to make more: use copy and paste, change the names
(using find and replace), proofread and test and you're set.
To read more about dynamic SQL:
http://www.sommarskog.se/dynamic_sql.html. Best, Hugo
--