all groups > sql server replication > june 2007 >
You're in the

sql server replication

group:

Replication Cleanup in high volume database


Replication Cleanup in high volume database RL
6/11/2007 9:41:15 AM
sql server replication: We're having intermittent performance problems on a SQL 2005 SP1 system that
we suspect is related to transactional replication. There may be overlap
with Jack Griffith's thread "Replication system disk performance severly
degrades after 1 month", but I'm not sure......

While troubleshooting the problem, we attempted to make things better by
clearing out any transactions that had already been replicated to the
subscriber (EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0,
@max_distretention = 0). BAD, BAD, BAD!!!!! Setting @max_distretention = 0
marked our subscription as inactive and we had to rebuild the subscriber
database to get it back in sync with the publisher.

I estimate add/update 2-4 million rows a day in our publication database.
The distribution database has grown to 9GB in < a week.
I have not altered any of the database cleanup jobs as of yet. The current
Distribution Cleanup job runs every 10 min and is taking 6-15 minutes to run.
We observe a tremendous number of disk reads on the Distribution database
disks. When we experience performance issues, we also see a HUGE number of
memory pages lasting ~ 15 seconds and occuring exactly every 2 minutes.
Restarting the SQLServer service seems to clear up the performance issues for
several hours.


* msrepl_commands rowcount approaches 35 million rows and
msrepl_transactions approaches 6 million rows. With a high volume in
transactional replication, should there be any indexes on the tables?
* While tracing through the MS SPs, I found that sp_MSdelete_dodelete only
deletes the top 5000 rows from MSrepl_transactions. In a high volume
transactional state, this might not keep up. Can this be changed safely?
* What other parameters can be changed to keep the transactions only as long
as they are needed but yet not mark the subscription as inactive?

Re: Replication Cleanup in high volume database RL
6/11/2007 10:57:11 AM
Thanks Hilary. We have one subscriber and are using RAID 10 as recommended.
The max_distretention is set to 72 hours already. The distribution database
was just rebuilt a week ago, so it only has 4 days of deletes in it, so i'm
not sure an index defrag will help.

Why do we need to keep 3 days worth of commands that have already been
replicated? Any suggestions on how to make the distribution cleanup job run
more efficiently? It's running at least 50% of the time now. Is it possible
we're just outpacing what replication was designed for? In my stress testing
before moving to SQL 2005 x64, replication was the weak performance link.

Thanks,
Rob

[quoted text, click to view]
Re: Replication Cleanup in high volume database Hilary Cotter
6/11/2007 1:34:11 PM
If you have a few subscribers use raid 10, if you have a large number use
raid 5.

Set max_distretention to 3 days. This will only pool undistributed commands
for up to 3 days.

the setting for only deleting 5000 rows at a time is by design. If you
choose to modify it to something higher make sure you make the modification
to the end of
sp_MSdelete_publisherdb_trans as well.

The appropriate indexes are in place on msrepl_commands and
msrepl_transactions. You might want to defrag these indexes to see if that
helps or update statistics.



--
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
[quoted text, click to view]

Re: Replication Cleanup in high volume database Hilary Cotter
6/11/2007 2:19:29 PM
I like to keep 3 days around in case I leave on a Friday night of a long
weekend when I come back on Tuesday I can still restart failed distribution
agents without them going expired on me.

The problem is that if you delete too many rows at a time the locks will
escalate to table level locks and your log reader agents will fail.

I have a friend who was in a similar situation as yours (Wachovia bank IIRC
in NYC) and he manually deleted rows in large batches. Reverse engineer to
the procs to see what to delete and stop your log reader agents while doing
this.

I also worked on a similar problem in Philly where the backlog was something
similar and we were able to get through everything in 2 hours.

--
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
[quoted text, click to view]

AddThis Social Bookmark Button