all groups > sql server data warehouse > june 2005 >
You're in the

sql server data warehouse

group:

Can I pick your brains for a second? transferring a large database problem


Can I pick your brains for a second? transferring a large database problem Scott M
6/6/2005 5:33:03 PM
sql server data warehouse:
-=-=- Sorry for the dup post, I'm posting it under my MSDN universal acct
now :) -=-=-

Okay, I have a problem that I need some suggestions with.

I have a 30 Gig database that I need to sync across a T1 every night. This
database is a data dump of our billing system data, the tables are dropped
ad re-created every night.

I can do a very long and laborious process on the server of creating tables
with the primary keys from each table and a binary checksum of a row, then
complaining it to a copy of the database from the day before. Then I
generate SQL statements for the updates (Actually deletes and inserts). All
of this on the source server.

I'd like to compact this down using a DTS package on the destination side,
however I can't use a dynamic query in the "openRowSet" command to get only
the row that I want.

Here is what I'm doing (Just for one table, I'll have data driven steps for
each table in my DB)

insert into @tempChecker
select BSum, chgNo from openquery(EMBAN2, 'Select BINARY_CHECKSUM(*) BSum,
chgno from charge_t')

Declare Charge_t_Cursor Cursor
FOR
select
a.chgno
from
(select
BSum,
chgno
from @tempChecker
) a
where
not exists
(
select 'x'
from charge_t b
where a.chgno = b.chgno and a.BSum <> BINARY_CHECKSUM(*)
)

Once I get all the chgno's (The key in the table) I want to only pull back
the data that has changes. So I have this (Which I know I can't do put you
get the idea)

Open Charge_t_Cursor

Fetch next from Charge_t_cursor into @ChgNo

While @@FETCH_STATUS = 0
BEGIN
insert into @ChargeT select *
from OPENROWSET(EMBAN2, 'SELECT * FROM charge_t WHERE chgno = ' +
cast(isnull(@ChgNo, 0) as varchar)))

Fetch next from Charge_t_cursor into @ChgNo
END

Close Charge_t_cursor
deallocate Charge_t_cursor

select * from @ChargeT

this way I'm only pulling over the wire the data that is updated and it will
be in a nice, compact binary format. ANY help would be VERY appreciated!!!
Thanks much!

Scott
Re: Can I pick your brains for a second? transferring a large database problem Scott M
6/6/2005 11:35:04 PM
Would replication work if the source tables are dropped every time it's
built?

[quoted text, click to view]
RE: Can I pick your brains for a second? transferring a large database problem petery NO[at]SPAM online.microsoft.com
6/7/2005 12:00:00 AM
Hello Scott,

It is not easy to use a DTS package to do this job. I think database
replication is more appropriate for this situation. You may consider
transaction or merge replication according to your requirements.

If you still want to consider DTS, lookup query feature might be helpful

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dtssql/dts_
addf_misc_2dix.asp

Regards,

Peter Yang
MCSE2000/2003, MCSA, MCDBA
Microsoft Online Partner Support

When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.

=====================================================


This posting is provided "AS IS" with no warranties, and confers no rights.




--------------------
| From: "Scott M" <scott_M@nospam.nospam>
| Subject: Can I pick your brains for a second? transferring a large
database problem
| Date: Mon, 6 Jun 2005 17:33:03 -0500
| Lines: 72
| MIME-Version: 1.0
| Content-Type: text/plain;
| format=flowed;
| charset="iso-8859-1";
| reply-type=original
| Content-Transfer-Encoding: 7bit
| X-Priority: 3
| X-MSMail-Priority: Normal
| X-Newsreader: Microsoft Outlook Express 6.00.2900.2527
| X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527
| Message-ID: <ui2eBfuaFHA.584@TK2MSFTNGP15.phx.gbl>
| Newsgroups: microsoft.public.sqlserver.datawarehouse
| NNTP-Posting-Host: adsl-068-209-157-050.sip.lft.bellsouth.net
68.209.157.50
| Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP15.phx.gbl
| Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.sqlserver.datawarehouse:1784
| X-Tomcat-NG: microsoft.public.sqlserver.datawarehouse
|
| -=-=- Sorry for the dup post, I'm posting it under my MSDN universal acct
| now :) -=-=-
|
| Okay, I have a problem that I need some suggestions with.
|
| I have a 30 Gig database that I need to sync across a T1 every night.
This
| database is a data dump of our billing system data, the tables are dropped
| ad re-created every night.
|
| I can do a very long and laborious process on the server of creating
tables
| with the primary keys from each table and a binary checksum of a row, then
| complaining it to a copy of the database from the day before. Then I
| generate SQL statements for the updates (Actually deletes and inserts).
All
| of this on the source server.
|
| I'd like to compact this down using a DTS package on the destination side,
| however I can't use a dynamic query in the "openRowSet" command to get
only
| the row that I want.
|
| Here is what I'm doing (Just for one table, I'll have data driven steps
for
| each table in my DB)
|
| insert into @tempChecker
| select BSum, chgNo from openquery(EMBAN2, 'Select BINARY_CHECKSUM(*) BSum,
| chgno from charge_t')
|
| Declare Charge_t_Cursor Cursor
| FOR
| select
| a.chgno
| from
| (select
| BSum,
| chgno
| from @tempChecker
| ) a
| where
| not exists
| (
| select 'x'
| from charge_t b
| where a.chgno = b.chgno and a.BSum <> BINARY_CHECKSUM(*)
| )
|
| Once I get all the chgno's (The key in the table) I want to only pull back
| the data that has changes. So I have this (Which I know I can't do put
you
| get the idea)
|
| Open Charge_t_Cursor
|
| Fetch next from Charge_t_cursor into @ChgNo
|
| While @@FETCH_STATUS = 0
| BEGIN
| insert into @ChargeT select *
| from OPENROWSET(EMBAN2, 'SELECT * FROM charge_t WHERE chgno = ' +
| cast(isnull(@ChgNo, 0) as varchar)))
|
| Fetch next from Charge_t_cursor into @ChgNo
| END
|
| Close Charge_t_cursor
| deallocate Charge_t_cursor
|
| select * from @ChargeT
|
| this way I'm only pulling over the wire the data that is updated and it
will
| be in a nice, compact binary format. ANY help would be VERY
appreciated!!!
| Thanks much!
|
| Scott
|
|
Re: Can I pick your brains for a second? transferring a large database problem naka55n NO[at]SPAM hotmail.com
6/7/2005 7:43:30 AM
How many tables are you talking about?

Why not just use triggers and create "shadow tables" that contain the
changes to your data?
Re: Can I pick your brains for a second? transferring a large database problem Myles.Matheson NO[at]SPAM gmail.com
6/7/2005 2:33:38 PM
Hello Scott,

I think you have the best idea using the check sum. For performance I
would look at creating a stored proc which DTS can call from with in
the Data pump task in your source database.

Have you considered deletes as well? This is easy to do against your
audit table just do a left outer join against your source tables to see
if the keys exist.

I would move away from the cursor as you don't need it. You can do
all you comparisons in a Select statement.

I would also look at creating three DTS data pump tasks.

1. New Records
2. Updated Records
3. Deleted Records.

I use Binary checksum all the time. It works well. There is a good
article on using this as a practice see:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/sql_busintbpwithdts.asp

Look under the section titled 'Data Transformation and Cleansing
Approach'


For everyone else I have noted some points have a look at the
following.

1. Triggers would be to slow against large inserts into the source
tables. Great for small amounts of data, but remember a trigger will
create an over head on you OLTP system.
2. Replication, although on paper this looks like a great idea again
you have an over head and new Services have to be enabled if you are
not doing replication already. Also I believe (not 100%) SQL server
2000 Replication does not support DDL changes. Still Replication would
be a great solution if the transactions are of a reasonable size.
3. One other option to look at if the tables where not being drop is
log shipping. Shipping the transaction log and processing the changes
against a database on another server is a great way of mirroring the
database.


Hope this helps

Myles Matheson
Data Warehouse Architect
Re: Can I pick your brains for a second? transferring a large database problem Myles.Matheson NO[at]SPAM gmail.com
6/8/2005 4:41:04 PM
Hello Scott,

Are you doing any transformations in ActiveX? (Formatting dates,
strings, etc) If not use the copy column transformation in the data
pump task. If you are doing transformations format data using TSQL and
use the data pump to load the destination tables.

ActiveX has a reputation for being slow with large amounts of data.

Check out: http://www.sql-server-performance.com/ for tuning tips when
using DTS.

Also check out www.SQLDTS.com great site for DTS programming.


One more thing, with Data pump task you can get great performance if
you do the following:

1. Format data using a stored proc
2. Call the stored proc using data pump task
3. pump the data using one transformation
4. use fast load option with table lock

Hope this helps

Myles
Re: Can I pick your brains for a second? transferring a large database problem Scott M
6/8/2005 5:20:35 PM
VERY awesome. I'm learning a lot from the article. But now I have this
issue...

It's taking .5 (almost) seconds to insert each record and each record is in
it's own transaction (Wouldn't that increase overhead?). The table is 243
columns wide and I'm loading it all through an ActiveX script. Any other
suggestions?

Hey, At least I can insert data!!!!! The Update doesn't seem to want to
run in tandem yet :(

[quoted text, click to view]
Re: Can I pick your brains for a second? transferring a large database problem Scott M
6/8/2005 5:22:04 PM
Sorry, inserts are on average of .1 seconds....

[quoted text, click to view]
Re: Can I pick your brains for a second? transferring a large database problem Myles.Matheson NO[at]SPAM gmail.com
6/8/2005 5:56:56 PM
Good question, I have read something similar around joining tables.

To be honest I try and avoid a pull process from any source system
(although is may not be possible in all cases). I prefer the source to
push the required files (data extracts) to the data warehouse.

This way if the source system goes down the data warehouse is not
affected and like wise if the DW server goes down the source server is
not affected.

In your case can you create the audit table on the source system? This
way you will process everything on the source server and only pull back
the new, updated records. To make this work DTS would call the Store
procedures from a SQL Execute Task or in the data pump task. This would
stop you from using Linked Servers.

Myles
Re: Can I pick your brains for a second? transferring a large database problem Scott M
6/8/2005 7:01:42 PM
Well, I'm using a linked server to retrieve the data. I was just pulling
back the keys that I need to insert and then performing a lookup.

It was my understanding that if I perform a query on a linked server it will
pull the entire dataset over to the working server. (Think I read it on
MSDN article somewhere)

If that is the case then I would just do a select from the remote server to
pull all the data.

Is my thinking correct?

[quoted text, click to view]
Re: Can I pick your brains for a second? transferring a large database problem Scott M
6/8/2005 8:40:11 PM
No. Unfortunately the data that I would like to compare it to is in one
location.

I MIGHT be able to execute the query there then only pump back the
updates........ HUmmm.... I COULD execute it on the server (source) and
not the server(destination) :)

THANKS MUCH!!

You have been a lifesaver!!! I'll try that tomorrow during the day and see
how this plays out.

[quoted text, click to view]
Re: Can I pick your brains for a second? transferring a large database problem Jéjé
6/19/2005 10:05:34 AM
have you try to create a local copy, then zip it, send the file by FTP,
unzip and load the data?

For example, you export to CSV files the new data from the source server.
A CSV file can be loaded in a database very quickly using bulk insert
commands.
Or you could detach a database, zip it, copy, reattach on the target server.
Or do the same with a backup file.

I have a demo database which is a 5gb of data, when I compress it, the size
is only 150mb and the time to transfert is excellent. (10min to compress,
10min to transfert, 5 to expand; instead-of 2h of transferts)


[quoted text, click to view]

Re: Can I pick your brains for a second? transferring a large database problem Scott M
7/7/2005 6:17:23 PM
Yes, the files are too large to zip in a decent amount of time (The database
files)

however we are zipping and transferring the flat files that the UNIX box is
creating.

[quoted text, click to view]
AddThis Social Bookmark Button