Replication is a process which is designed with a high
degree of disconnectedness in mind. In other words to be
fault tolerant. More than 90% of the errors you will ever
see can be solved by restarting the particular jobs.
So your experience is typical.
The handholding/monitoring can be extensive depending on
your application. However what you can do is have the
agents run continously, but also restart every 10 minutes
or so. this way when they encounter a transient error,
the next time they restart they will succeed. There are
replication alerts that can be used for monitoring. And it
is possible to read the distribution history tables in the
distribution database to roll your own alerting functions
and have thresholds associated with them to capture
constantly failing jobs.
If you are down for an extended period of time,
replication will prompt you to do a resync, which will
resend all the data from your publisher to your
subscribers again.
Normally this is a pretty safe procedure.
[quoted text, click to view] >-----Original Message-----
>We are getting ready to roll out our first replication
project to a small
>test group.
>
>I think we have everything set up, and it is working
pretty well.
>
>I am a bit nervous though, as we occasionally observe
errors in the
>Replication monitor, and the entire process stops.
Depending on the error,
>we are usually able to fumble around and get it
restarted, but most of the
>errors are somewhat cryptic, and we seem to get different
errors depending
>upon what broke.
>
>My question is that in general, is SQL replication pretty
stable?
>
>What kind of handholding/monitoring is usually required?
>
>Are there good resources for troubleshooting errors as
they come up?
>
>And worse case, if replication were to go down for a few
days, or the tables
>on either end were to get trashed/corrupted/inconsistent,
are there
>tools/procedures available to either automatically resync
the data, or will
>we be facing tons of manual cleanup?
>
>We are doing some pretty simple stuff, mainly updateable
transactional
>replication, and don't anticipate a lot of conflicts, but
are wanting to get
>a feel for how much support time we may need to allocate
to this project as
>it rolls out.
>
>Any advice or insight is appreciated....
>
>Thanks.
>
>
>.