On 9 Sep, 05:42, ChiWhiteSox <ChiWhite...@discussions.microsoft.com>
[quoted text, click to view] wrote:
> Hi All !
>
> im kinda beginner in SSIS and any help will be really appreciated !
>
> we have 2 separate DAtabases and each DB has a common table , lets say
> Tracking Table. In DB1, Tracking has columns that has varchar datatypes, and
> at DB2 it has same columns but of INT datatype.
>
> Basically, each night , I want to extract only new records from Tracking
> table at DB1 to Tracking Tbl at DB2. Then convert the other columns from
> varchar to INT.
>
> Also there are some columns at DB1 that need to be looked up at some
> reference tables before it can be saved at the Tracking at DB2. But that can
> come in later, what i need now is just to convert those varchars into INTs
> on the DB2 server. Im talking about 10-15 millions of rows from DB1 per day.
>
> Any ideas which tool I should use in SSIS? any links I can read to help with
> my problem ? I know I should use 2 diffrent connection managers for DB1 and
> DB2 on the SSIS project, and i also wanna avoid creating temp tables at
> either Databases. Maybe just store the new records in the memory (after
> comparing the 2 tables from 2 databases ) and then iterate through that
> memory recordset then save new records to the DB2 tracking Table. And then I
> should also use the For loop component? Can anybody show me the flow of how
> my SSIS should be??? please? someone ??
Hi ChiWhiteSox,
There are a number of ways to do what you are asking.
First off, you will definitely need 2 connections - 1 for each DB.
SSIS requires Metadata to be created at design time, not at runtime -
this means that if you want to change a connection at runtime (which
is possible) all of the objects referenced by the package need to
exist in the new connection with exactly the same metadata. So
basically you can't use 1 connection to point to different tables in
different databases at the same time.
Once you've got your connections configured, you'll need to add a Data
Flow task to your package. I can't see any reason for For Loops or
any other components in the case you've described. Inside the data
flow task you have a few options:
1) Filter and convert data in your Data Source task. It is possible
to simply convert and filter your data in the data source task using
SQL. To do this the databases would need to be able to see eachother
and the user specified in your connection to the source database would
also need access to the destination database to satisfy the filter
section of the statement.
2) Filter and convert the data in the Data Flow task (i.e. in SSIS).
3) Combination of the two above methods.
Option 1 you'll basically be making the DB engine do all of the work.
This could be good when dealing with large volumes of data (I would
consider 10 - 15 million rows per day a large volume of data) -
however, you lose the row-at-a-time error handling provided by SSIS,
and a lot of transformations will be slower as SSIS does them in
memory where possible.
Option 2 you you need a slightly different approach. First off,
you'll simply select ALL the data out of DB1. This needs to be
converted using a Data Conversion task to match the types in DB2
(INT), and then either a Lookup or Merge Join task added to filter the
data before the final Insert to DB2. The advantage of this is that
you get a much greater level of control over your data as it moves
through the process. Error handling allows you to reject individual
rows that fail the data conversion for example (in option 1 the source
task would fail if one of the column values failed to be converted in
SQL). You could also choose to do something with the rows that
already exist (check for updates, log that they existed, etc) On the
negative side, because SSIS tries to load as much as possible into
data during execution (particularly when using a Lookup task), with
the volumes you mentioned the package could run out of memory which
would dramatically lower the performance resulting in slower load
times to conducting some of the work in the Source task.
Option 3 obviously allows you to mix-and-match techniques to determine
which is best in your case. You might want to filter the data in SQL
(i.e. Source task) but convert the values using the Data conversion
task for example.
I hope this helps!
Good Luck
J