Hi Randy,
I suspect that certain data that was introduced recently into the
problematic table at the publisher was causing bcp to choke up. This is
particular likely if your problematic table contains blob (i.e. text\image)
columns. But first, you need to find out whether my suspicion was indeed
valid and hopefully to isolate the problematic rows if it is. To this end, I
have the following suggestions:
1) Rule out the obvious by making sure that the distribution agent is not
blocked by another process at the subsciber by calling sp_who2 or using SQL
Profiler (this is admittedly extremely unlikely since all your subscribers
failed the same way)
2) Specify a smaller -BcpBatchSize value (say, 1000) for the distribution
agent by either changing the replication agent profile or the agent command
line in msdb..sysjobsteps at the subscriber. This should give you a better
chance to see if the distribution agent can make any progress at all and
approximately where the problematic rows are (note: bcp should be done in
clustered index order)
3) If the problematic table is not too big, you may be able experiment by
manually bcping out of the problematic table at the publisher and bcp back
into a temporary table. Here is how you can do it: Create an empty temporary
table with the same schema as the problematic using "select *into <temptable>
from <problematic table> where 1 = 2", bcp data out from problematic table to
a file, and bcp file data back into temporary table. If you know
approximately where the problematic rows are, you can bound your bcp
operations with a where clause.
4) There was a known issue with bcping in tables through replication with
(n*65536) - 1 rows so you may want to watch out for that. This is supposed to
be fixed in MDAC2.8 SP1 so it would be great if you can let us know what
version of SQL2000 are you running.
5) There was also a known issue with related to null text\image data in the
last column of a table so you may want to watch out for that too.
-Raymond
[quoted text, click to view] "Randy Shore" wrote:
> I'm using SQL Server 2000, and have set up snapshot replication with many
> anonymous pull-subscribers. This has been running without problem for about
> 30 months.
>
> The replication agent runs every night and creates a new snapshot, but
> recently, when my subscribers synchronize, the distribution agent carries
> out the first 615 actions and then it seems to freeze. For every
> subscriber, the process stops on the action "bulk copying data into table
> 'TableName'". The Replication Monitor stops showing updates of the process.
> (I have it set to poll and refresh every 10 seconds.) Eventually the job
> fails with the message "The agent is suspect. No response within the last
> 120 minutes."
>
> This has been running with no problems for 2.5 years, and now every
> subscription machine fails at the same point.
>
> Any ideas what might be the problem?
>
> Thanks in advance for any help.
>
>