all groups > sql server dts > september 2007 >
You're in the

sql server dts

group:

Easy SSIS question ??


Easy SSIS question ?? ChiWhiteSox
9/8/2007 9:42:01 PM
sql server dts:
Hi All !

im kinda beginner in SSIS and any help will be really appreciated !

we have 2 separate DAtabases and each DB has a common table , lets say
Tracking Table. In DB1, Tracking has columns that has varchar datatypes, and
at DB2 it has same columns but of INT datatype.

Basically, each night , I want to extract only new records from Tracking
table at DB1 to Tracking Tbl at DB2. Then convert the other columns from
varchar to INT.

Also there are some columns at DB1 that need to be looked up at some
reference tables before it can be saved at the Tracking at DB2. But that can
come in later, what i need now is just to convert those varchars into INTs
on the DB2 server. Im talking about 10-15 millions of rows from DB1 per day.

Any ideas which tool I should use in SSIS? any links I can read to help with
my problem ? I know I should use 2 diffrent connection managers for DB1 and
DB2 on the SSIS project, and i also wanna avoid creating temp tables at
either Databases. Maybe just store the new records in the memory (after
comparing the 2 tables from 2 databases ) and then iterate through that
memory recordset then save new records to the DB2 tracking Table. And then I
should also use the For loop component? Can anybody show me the flow of how
Re: Easy SSIS question ?? jhofmeyr NO[at]SPAM googlemail.com
9/12/2007 4:12:01 AM
On 9 Sep, 05:42, ChiWhiteSox <ChiWhite...@discussions.microsoft.com>
[quoted text, click to view]

Hi ChiWhiteSox,

There are a number of ways to do what you are asking.

First off, you will definitely need 2 connections - 1 for each DB.
SSIS requires Metadata to be created at design time, not at runtime -
this means that if you want to change a connection at runtime (which
is possible) all of the objects referenced by the package need to
exist in the new connection with exactly the same metadata. So
basically you can't use 1 connection to point to different tables in
different databases at the same time.

Once you've got your connections configured, you'll need to add a Data
Flow task to your package. I can't see any reason for For Loops or
any other components in the case you've described. Inside the data
flow task you have a few options:

1) Filter and convert data in your Data Source task. It is possible
to simply convert and filter your data in the data source task using
SQL. To do this the databases would need to be able to see eachother
and the user specified in your connection to the source database would
also need access to the destination database to satisfy the filter
section of the statement.
2) Filter and convert the data in the Data Flow task (i.e. in SSIS).
3) Combination of the two above methods.

Option 1 you'll basically be making the DB engine do all of the work.
This could be good when dealing with large volumes of data (I would
consider 10 - 15 million rows per day a large volume of data) -
however, you lose the row-at-a-time error handling provided by SSIS,
and a lot of transformations will be slower as SSIS does them in
memory where possible.

Option 2 you you need a slightly different approach. First off,
you'll simply select ALL the data out of DB1. This needs to be
converted using a Data Conversion task to match the types in DB2
(INT), and then either a Lookup or Merge Join task added to filter the
data before the final Insert to DB2. The advantage of this is that
you get a much greater level of control over your data as it moves
through the process. Error handling allows you to reject individual
rows that fail the data conversion for example (in option 1 the source
task would fail if one of the column values failed to be converted in
SQL). You could also choose to do something with the rows that
already exist (check for updates, log that they existed, etc) On the
negative side, because SSIS tries to load as much as possible into
data during execution (particularly when using a Lookup task), with
the volumes you mentioned the package could run out of memory which
would dramatically lower the performance resulting in slower load
times to conducting some of the work in the Source task.

Option 3 obviously allows you to mix-and-match techniques to determine
which is best in your case. You might want to filter the data in SQL
(i.e. Source task) but convert the values using the Data conversion
task for example.

I hope this helps!
Good Luck
J
Re: Easy SSIS question ?? ChiWhiteSox
9/12/2007 7:58:01 AM
Thanks a lot J, this kinda gave me an idea now how to best approach it!
thanks so much! God Bless you !!

[quoted text, click to view]
AddThis Social Bookmark Button