all groups > sql server mseq > october 2007 >
You're in the

sql server mseq

group:

Query for only specific rows in a table? Need Help with Query


Query for only specific rows in a table? Need Help with Query Gordon
10/9/2007 8:39:00 AM
sql server mseq: We have a stored procedure that removes old data from a very large (millions
of rows) table and puts it into an archive table. It grabs the top 5000 rows
and moves them.

B/C the database is in production and is heavily used, the job is run every
5 minutes
to avoid performance issues. When first started, the query ran quickly b/c
it would find 5000 rows fast. now that it has been running for a week it
takes longer for the query to execute and is locking the tables for too long.

Is there a way to query only a specific set of rows in the query? We could
store the last known row that was deleted in a temp table and start the query
over from that last known row. The problem is we don't know how to do this
with code in the stored procedure. Our code is below. Can anyone help us
with this or point us in the right direction to get this accomplished?


INSERT INTO gbdb_arch..tests_to_archive

select top 5000 p.test_id from gbdb..tests p (NOLOCK) LEFT OUTER JOIN
gbdb_arch..tests_to_archive a (NOLOCK) ON

p.test_id = a.test_id where var_id in (select var_id from gbdb..variables
(NOLOCK) where pu_id <> 0)

AND Result_On < DATEADD(year, -2, getdate()) AND a.test_id IS NULL

go

Re: Query for only specific rows in a table? Need Help with Query Russell Fields
10/9/2007 4:41:40 PM
Gordon,

A few questions that could affect your performance.
1 - Do you have indexes to support the TOP 5000 select? If not, create an
index to support the select and delete.
2 - Are your statistics up-to-date? If not you should update those?
(sp_updatestats)
3 - Is your data becoming fragmented as you delete rows? If so you should
defragment the table?
(SS2000 & 2005 DBCC DBREINDEX, SS2005 ALTER INDEX)

Another observation is that your TOP 5000 (in the sample) does not have an
ORDER BY. Therefore, the 5000 rows being deleted are an undefined selection
from the qualifying rows. TOP makes much more sense with an ORDER BY.

How are you deleting the rows? If you are deleting by joining
gbdb_arch..tests_to_archive to gbdb..tests you join set it getting bigger
and bigger as you build up the archive table. For example
DELETE ts
FROM gbdb..tests ts JOIN gbdb_arch..tests_to_archive ar
ON ts.test_id = ar.test_id
If this is the problem, then you might need an index on tests_to_archive to
support the join.

RLF


[quoted text, click to view]

Re: Query for only specific rows in a table? Need Help with Query Hugo Kornelis
10/9/2007 10:58:15 PM
[quoted text, click to view]

Hi Gordon,

In addition to Russell's reply, some more points.

1. What version of SQL Server are you using? SQL 2005 has a new option
(the OUTPUT option) that you can leverage for a tremendous performance
boost.

2. Why are you using (NOLOCK). Are you aware of the risks of reading
dirty data, missing rows, or reading rows twice? Will you really risk
archiving dirty data for a performance gain?

3. I assume that the stored proc bow_ArchiveHistoricalTestData does the
actual delete. That means that copying to archive and purging the
original are not only in seperate transactions; they are even in
seperate batches. You run the risk that the insert succeeds, but the
delete fails - and you even run the risk that the insert fails and the
delete succeeds, causing you to lose data permanently!!

4. I agree with Russell that the real problem is probably in the
bow_ArchiveHistoricalTestData procedure. Can you please post that code?

--
Hugo Kornelis, SQL Server MVP
AddThis Social Bookmark Button