all groups > sql server dts > january 2006 >
You're in the

sql server dts

group:

Optimize transformation


Optimize transformation Dave
1/25/2006 9:08:53 AM
sql server dts:
I have the below table that I need to load about 1 billion records
into. I was wondering if anyone could offer some tips on how to
optimize the process.

I load data_table via a staging table that could have 1000 records -
50 million records. I use a left outer join to ensure that no
duplicates are inserted.

I have read that using one copy column transformation for all columns
can increase performance, and I do have "Use Fast Load" selected.
Also, both tables are on the same sever.

Is there a way to speed up this load? Would it be a bad idea to use
the transformation to load a csv text file and then use the bulk insert
command to load the data?

CREATE TABLE [dbo].[data_table] (
[project] [int] NULL ,
[responseid] [int] NULL ,
[attrib1] [varchar] (55) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[data] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[data_type] [varchar] (155) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]

CREATE CLUSTERED INDEX [icx__cluster] ON
[dbo].[data_table]([responseid], [project]) WITH FILLFACTOR = 70,
PAD_INDEX ON [PRIMARY]
CREATE INDEX [ix__data_table__attrib1] ON
[dbo].[data_table]([attrib1]) WITH FILLFACTOR = 70, PAD_INDEX ON
[INDEXES]
CREATE INDEX [ix__data_table__data] ON [dbo].[data_table]([data])
WITH FILLFACTOR = 70, PAD_INDEX ON [INDEXES]

Other Notes:
project: value is too large for a smallint, but there are only about
1200 project values
responseid: unique to project. The largest project has 3.5 million
responseid values
attrib: ~100,000 unique values
data: ~ 500,000 unique values
Re: Optimize transformation Allan Mitchell
1/25/2006 1:23:51 PM
Hello Dave,

A number of things for you to try

1. get rid of indexes

These will hurt you during loading

2. Recovery model set to SIMPLE

3. How about not using DTS. What about pure TSQL?

4. Make sure the log has lots of room to expand likewise with the data files.

5. Make sure you have no triggers on the destination tables.


Have a look in BOL for any other requirements for FAST LOAD

Allan


[quoted text, click to view]

Re: Optimize transformation Allan Mitchell
1/25/2006 1:26:50 PM
Hello Allan,

And I forgot to mention that if you do use DTS and the Transform Data Task
then use 1 Transform that has all your source columns and destination columns
within. Do not use 1 per pair. Also use a COPY COLUMN transform


Allan

[quoted text, click to view]

Re: Optimize transformation Dave
1/26/2006 11:33:11 AM
Thanks Allan for the tips.

I am using one copy column transformation (COM object) for each column.
I will be sure and change that.

I can't drop indexes because they are required for the LEFT OUTER JOIN
that calculates the set that will be inserted.

Recovery is already in SIMPLE; however I don't think recovery mode will
make any difference because the operation must be written to log before
it is written to data files.

Very good tip on using large enough log and data files! That should
minimize the overhead of auto-growing the files.

T-SQL: I have considered this. Does anyone have any insight on the
difference between DTS and plain T-SQL? I understand that DTS uses a
COM object for its transformation. For some reason I have always
thought that DTS is optimized for data transfers/loads. Is this not
the case? Will T-SQL be faster? If so, will committing the records in
small batches (50-100K) using DTS overcompensate for the hit that DTS
takes?

Thanks for the advice!
AddThis Social Bookmark Button