Groups | Blog | Home
all groups > sql server programming > april 2007 >

sql server programming : Data aggregation, syhcronization, and search


crbd98 NO[at]SPAM yahoo.com
4/8/2007 9:49:24 PM
Hello All,

I know that this problem is not SQL Server specific but I decided to
post it here because it is a problem that I am certain many of you
expericenced (or at least thought about).

In my system, I have a central SQL Server Database and number of
remote "data providers". Some of these data providers are other SQL
Servers and others are other databases or other data provider services
that expose a database-like interface.

We provide two levels of search capability in the system. The user can
search either the central database, or search one specific data
provider. One of the new requirements, was to provide a global search,
which would search the central database and all the data providers at
once.

Performing a distributed is prohibitive in our scenario because we
have many data providers and they remotelly located. To minimize the
search time, my first reaction was to create a copy of the data from
the remote providers on the central database and keep it in synch with
the data in the remote data providers. By doing this, I would expect
to simplify the search by having simply a search in the central
database.
However, a number of new question concerning data synchronization
arise:
1. How can I keep the data in the central db in synch with the remote
data providers? (If they all were SQL Server databases, I would use
merger replication...but they aren't?)
2. How to handle disconnects? If the remote provider disconnets and
reconnects, some of the data might be stale. In pronciple, I could
take a new snapshot and keep the data in synch from that point.
However, this is very expensive!! Do you know of any other techniques
for data synchronization that would minimize network traffic?
3. Is this a good approach? What do you think?

Your opinion is gratly appreciated.

Kind regards
CD
John Bell
4/9/2007 1:10:09 AM
Hi

[quoted text, click to view]
You don't say how you currently search the remote data providers! It may be
more acceptable for the users if the global search initially searched the
central database and returned the results and then searched the remote
databases which would return something to the user quicker and reduce the
need to speed up the remote searches.

If you held the data centrally, you would also need to know how much latency
would be acceptable for the data, if you could get away with uploading once a
day out of hours then this could be an easier solution to implement.

AddThis Social Bookmark Button