We are standing at 3 level merge replication, SQL2000 SP3.
One topmost server, couple of mid-server republishers with 1-4 subscribers
(3-rd level) each. The replication strategy is working smooth for about 4
years, but the strange behaviour we noticed starting month ago.
One of republishers, named LASNDK begun generatind abnormal amount of
generation history records (msmerge_genhistory rows). His subscriber WSNDK
spends now 4:30 minutes avg. for one synchronization. Synchronization time
for other subscribers (with nearly the same data load) is about 20-30 sec and
this was the timing 2 onths ago with WSNAD too. Trying to resolve the
problem, looking for orphaned generations, local generations and so on, we
found some interesting (but unexpected) behaviour.
First of all, there are 1,199,427 rows in msmerge_genhistory table.
Rublication on republisher LASNDK has 30day retention period and this matches
all rows in msmerge_genhistory table. These 1,199,427 rows are all from last
30day period. Comparison between 2 republishers LASNDK and LASZONA follows:
--------------------------------------
SET NOCOUNT ON
SELECT CAST( SERVERPROPERTY( 'servername' ) AS VARCHAR(20) ) republisher
SELECT COUNT(*) gh
FROM msmerge_genhistory
SELECT COUNT(*) c
FROM msmerge_contents
SELECT COUNT(*) ts
FROM msmerge_tombstone
[quoted text, click to view] >>>>>>>>>>> result on LASNDK >>>>>>>>>
republisher
--------------------
LASNDK
gh
-----------
1199415
c
-----------
41287
ts
-----------
8141
[quoted text, click to view] >>>>>>>>>>> result on LASZONA >>>>>>>>>
republisher
--------------------
LASZONA
gh
-----------
101784
c
-----------
43721
ts
-----------
9834
--------------------------------------
Second, we look at generation count distribution by days in two servers
--------------------------------------
SET NOCOUNT ON
SELECT CAST( SERVERPROPERTY( 'servername' ) AS VARCHAR(20) ) republisher
SELECT DATEPART( dayofyear, coldate ) day,
COUNT(*) gencount
FROM msmerge_genhistory
GROUP BY
DATEPART( dayofyear, coldate )
ORDER BY
DATEPART( dayofyear, coldate )
[quoted text, click to view] >>>>>>>>>>> result on LASNDK >>>>>>>>>
republisher
--------------------
LASNDK
day gencount
----------- -----------
111 1501
112 4052
113 10234
114 3056
115 3473
116 3752
117 3423
118 4196
119 4118
120 2817
121 2514
122 849
123 5074
124 3660
125 3707
126 2863
127 2682
128 2809
129 78963
130 119707
131 179610
132 178117
133 179382
134 171136
135 160108
136 50829
137 3334
138 3341
139 3591
140 3282
141 2839
142 404
[quoted text, click to view] >>>>>>>>>>> result on LASZONA >>>>>>>>>
republisher
--------------------
LASZONA
day gencount
----------- -----------
111 1869
112 4941
113 4030
114 3014
115 3485
116 3702
117 3631
118 4014
119 4108
120 2818
121 2523
122 1765
123 4155
124 3670
125 3706
126 2857
127 2677
128 2807
129 3700
130 3440
131 3310
132 3171
133 3014
134 2840
135 2536
136 3196
137 3362
138 3294
139 3592
140 3285
141 2852
142 402
----------------------------------------
And next we look at day 134 and day 137 in LASNDK. Day 134 has 171,136
generations, day 137 has 3334 (near to avg generations per day for LASZONA
and LASNDK except some peaks). This is 20-times difference. Looking deeper on
these two days give us:
----------------------------------------
SET NOCOUNT OFF
SELECT CAST( SERVERPROPERTY( 'servername' ) AS VARCHAR(20) ) republisher
SELECT coldate,
count(*) gencount_137
FROM msmerge_genhistory
WHERE coldate >= '2005-05-17'
AND coldate < '2005-05-18'
GROUP BY
coldate
ORDER BY
coldate
[quoted text, click to view] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
republisher
--------------------
LASNDK
(1 row(s) affected)
coldate gencount_137
------------------------------------------------------ ------------
2005-05-17 00:00:21.810 1
2005-05-17 00:00:22.123 1
2005-05-17 00:00:22.153 1
2005-05-17 00:00:22.170 1
2005-05-17 00:00:22.187 1
2005-05-17 00:00:22.217 1
2005-05-17 00:00:22.233 1
2005-05-17 00:00:22.250 1
2005-05-17 00:00:22.263 2
2005-05-17 00:00:22.280 1
.... Skipping the middle, all rows with gencount=1 or gencount=2
2005-05-17 23:56:08.200 1
2005-05-17 23:56:08.217 1
2005-05-17 23:56:08.250 1
(2929 row(s) affected)
-----------------------------------------
and look now at most interesting part (day 134)
-----------------------------------------
SET NOCOUNT OFF
SELECT CAST( SERVERPROPERTY( 'servername' ) AS VARCHAR(20) ) republisher
SELECT coldate,
count(*) gencount_134
FROM msmerge_genhistory
WHERE coldate >= '2005-05-14'
AND coldate < '2005-05-15'
GROUP BY
coldate
ORDER BY
coldate
[quoted text, click to view] >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
republisher
--------------------
LASNDK
(1 row(s) affected)
coldate gencount_134
------------------------------------------------------ ------------
2005-05-14 00:05:06.847 1366
2005-05-14 00:21:05.783 1
2005-05-14 00:21:06.033 1
2005-05-14 00:21:06.063 2
2005-05-14 00:21:09.737 1366
2005-05-14 00:21:42.300 1
2005-05-14 00:21:42.360 1
2005-05-14 00:32:59.690 1
2005-05-14 00:32:59.940 1
2005-05-14 00:32:59.970 1
2005-05-14 00:33:03.377 1366
2005-05-14 00:44:29.520 1366
2005-05-14 00:45:32.957 1
2005-05-14 00:45:33.190 1
2005-05-14 00:45:33.220 1