Groups | Blog | Home
all groups > sql server replication > november 2003 >

sql server replication : Replication Stability, Recovering From Errors


Tappy Tibbons
11/13/2003 1:20:45 PM
We are getting ready to roll out our first replication project to a small
test group.

I think we have everything set up, and it is working pretty well.

I am a bit nervous though, as we occasionally observe errors in the
Replication monitor, and the entire process stops. Depending on the error,
we are usually able to fumble around and get it restarted, but most of the
errors are somewhat cryptic, and we seem to get different errors depending
upon what broke.

My question is that in general, is SQL replication pretty stable?

What kind of handholding/monitoring is usually required?

Are there good resources for troubleshooting errors as they come up?

And worse case, if replication were to go down for a few days, or the tables
on either end were to get trashed/corrupted/inconsistent, are there
tools/procedures available to either automatically resync the data, or will
we be facing tons of manual cleanup?

We are doing some pretty simple stuff, mainly updateable transactional
replication, and don't anticipate a lot of conflicts, but are wanting to get
a feel for how much support time we may need to allocate to this project as
it rolls out.

Any advice or insight is appreciated....

Thanks.

Hilary Cotter
11/14/2003 1:01:45 PM
Replication is a process which is designed with a high
degree of disconnectedness in mind. In other words to be
fault tolerant. More than 90% of the errors you will ever
see can be solved by restarting the particular jobs.

So your experience is typical.

The handholding/monitoring can be extensive depending on
your application. However what you can do is have the
agents run continously, but also restart every 10 minutes
or so. this way when they encounter a transient error,
the next time they restart they will succeed. There are
replication alerts that can be used for monitoring. And it
is possible to read the distribution history tables in the
distribution database to roll your own alerting functions
and have thresholds associated with them to capture
constantly failing jobs.

If you are down for an extended period of time,
replication will prompt you to do a resync, which will
resend all the data from your publisher to your
subscribers again.

Normally this is a pretty safe procedure.

[quoted text, click to view]
lt
11/20/2003 8:45:58 AM
is there any tools as far as software packages to monitor
microsoft replication??
[quoted text, click to view]
AddThis Social Bookmark Button