Groups | Blog | Home
all groups > sql server replication > april 2006 >

sql server replication : Synchronising In-House and Web Data


Steve Barker
4/6/2006 9:42:02 AM
Hi,

This is the situation:

We have a large in-house database. It has quite a few tables, some of which
contain in the order of 8 million records. This is our production database;
the result of all our internal systems.

Now, we want to make some of this data available to our clients via our
web-site. Hence, we have a web-server (running SQL Server), which exposes
data to a set of web-pages. The data on the web-server is being read only;
our web users do not update this data. Problem: Our web-server is hosted by a
third party at an external location. Bringing this in-house is not an option.

At the moment, we log what changes in our in-house production database, and
produce CSV files of the changes. These are then FTP-ed to the web-server
overnight, and a DTS package inserts the changes into the web-server version
of our database. The CSV files we send to the web-server are in the region of
20MB, so quite a lot of information changes on a day-to-day basis.

The CSV method seems like quite an antiquated way of doing things, and I
wondered if we could do something slicker using replication (standard
transactional with readonly subscribers). Would replication be able to cope
with the web-database sitting there for hours without a change, and then all
of a sudden having to import 20MB worth of changes? Is there another method
we could use to keep the web-server up-to-date?

I'm pretty sure that this is a problem that other organisations face. How do
they solve the problem?

Thanks in advance for your help!

Steve.
Hilary Cotter
4/6/2006 7:06:04 PM
As long as you accept inbound tcp/ip on the firewall to the web site you can
use push replication for this. If not, try a pull.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Steve Barker
4/7/2006 3:50:02 AM
Hilary,

Thanks for your reply! I realise that we could set replication up over the
net, but I'm worried that the replication wouldn't be able to cope with the
volumes of data we'd need to send.

I understand that transactional replication is good for a constant stream of
changes, but that's not exactly what I want to do. My changes are sent in
large, isolated batches. The analogy is that I'm filling a swimming pool with
a bucket instead of a hose pipe.

If the replication engine suddenly gets a request to update a few hundred
thousand records, how would it cope, and what would happen to the data during
the update? I can imagine scenarios where parent records have been updated by
the replication, but associated child records have not yet been updated, so
the user could theoretically see a mix of old and new data on screen, which
could be contradictory.

My question is really this: I'm pretty sure that what we're trying to
achieve is quite a common problem. Many other organisations must have large
web-based SQL Servers which they update periodically. How do they solve the
synchronisation problem, while maintaining a coherent user experience and SQL
Server responsivity?

Thanks again,

Steve.



[quoted text, click to view]
petery NO[at]SPAM online.microsoft.com (
4/10/2006 12:00:00 AM
Hello Steve,

Since transactional replication is transaction based, all related changes
of tables are committed at the same time and you shall not worry about
inconsistent of data on subscriber.

The only issue you may want to consider is performance of large update
batch on publisher. This may cause performance peak for log reader
agent/distrution agent and will certainly bring overhead on publisher.
Also, it may bring network block if bandwidth of WAN is limited.

Anyway, build a test enviorment to test/evaluate is the best option for
now.

Best Regards,

Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support

When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.

=====================================================


This posting is provided "AS IS" with no warranties, and confers no rights.


--------------------
[quoted text, click to view]
Hilary Cotter
4/10/2006 8:54:52 PM
You might want to look at replicating the execution of stored procedures.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Hilary Cotter
4/13/2006 12:00:00 AM
Consider a batch operation which is kicked of by a proc called test. This
proc updates 10,000 rows. If you replicate the execution of the proc test,
all that will be fired on the subscriber is the proc test. The 10,000 rows
will not be replicated, but the proc's execution will be.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Steve Barker
4/13/2006 2:10:02 AM
Thanks for the help. You say that the replication is transactional, but what
if the update is set to take about an hour? Let's say there are three tables
being replicated, called A, B and C. Let's say that each has about 20 minutes
of changes to apply. Let's also say that C is a child of B, which is a child
of A. What happens to access of tables A and B while C is being updated? What
if children of B are being queried by the user while table C is being
updated? Due to the transactional nature of replication, will the queries
lock until the replication transaction is complete? I don't understand what
happens to a database if queries are made mid-way through a transaction...

Thanks again!

Steve.

[quoted text, click to view]
Steve Barker
4/13/2006 2:11:02 AM
What does replicating the execution of stored procedures achieve?

[quoted text, click to view]
Steve Barker
4/13/2006 5:56:01 AM
Ah! I'm with you now. That sounds like a really useful feature! Thanks!

[quoted text, click to view]
petery NO[at]SPAM online.microsoft.com (
4/14/2006 12:00:00 AM
Hello Steve,

The logread agent will not get update when a transaction unless it is
committed.

There is a one-to-one correspondence between transactions on the Publisher
and replication transactions in the distribution database. One transaction
stored in MSrepl_transactions can consist of one or more commands and each
command can be broken up along a 500-Unicode-character boundary in the
MSrepl_commands table. After the entire batch of transactions has been
written successfully to the distribution database, it is committed.
Following the commit of each batch of commands to the Distributor, the Log
Reader Agent calls sp_repldone to mark where replication was last
completed. Finally, the agent marks the rows in the transaction log that
are ready to be truncated. Rows still waiting to be replicated are not
truncated. The transaction log on the Publisher can be dumped without
interfering with replication, because only transactions not marked for
replication are purged.

Transaction commands are stored in the distribution database until the
Distribution Agent propagates them to all Subscribers or a Distribution
Agent at the Subscriber pulls the changes.

However, when tranaction is replicated to the subscriber, it will lock the
resource for update until it is committed. Therefore, client read/query
might wait for resource lock to release under this situation. This is the
same as that you run the transaction locally on subscriber.

You could use nolock or readpast hint at this time but it may casuse dirty
read or partial read. Anyway, it is not good design to have 1 hour
transaction. It is suggested that you committed small change unless you
think it is absolutely necessary to do this.

Regards,

Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support

When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.

=====================================================


This posting is provided "AS IS" with no warranties, and confers no rights.

Steve Barker
4/26/2006 3:06:02 AM
Ah, OK, I see now. Thanks very much for your detailed reply!

[quoted text, click to view]
petery NO[at]SPAM online.microsoft.com (
4/27/2006 1:31:22 AM
Welcome!

Regards,

Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support

When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.

=====================================================


This posting is provided "AS IS" with no warranties, and confers no rights.

AddThis Social Bookmark Button