sql server dts:
I have the below table that I need to load about 1 billion records
into. I was wondering if anyone could offer some tips on how to
optimize the process.
I load data_table via a staging table that could have 1000 records -
50 million records. I use a left outer join to ensure that no
duplicates are inserted.
I have read that using one copy column transformation for all columns
can increase performance, and I do have "Use Fast Load" selected.
Also, both tables are on the same sever.
Is there a way to speed up this load? Would it be a bad idea to use
the transformation to load a csv text file and then use the bulk insert
command to load the data?
CREATE TABLE [dbo].[data_table] (
[project] [int] NULL ,
[responseid] [int] NULL ,
[attrib1] [varchar] (55) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[data] [varchar] (255) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[data_type] [varchar] (155) COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY]
CREATE CLUSTERED INDEX [icx__cluster] ON
[dbo].[data_table]([responseid], [project]) WITH FILLFACTOR = 70,
PAD_INDEX ON [PRIMARY]
CREATE INDEX [ix__data_table__attrib1] ON
[dbo].[data_table]([attrib1]) WITH FILLFACTOR = 70, PAD_INDEX ON
[INDEXES]
CREATE INDEX [ix__data_table__data] ON [dbo].[data_table]([data])
WITH FILLFACTOR = 70, PAD_INDEX ON [INDEXES]
Other Notes:
project: value is too large for a smallint, but there are only about
1200 project values
responseid: unique to project. The largest project has 3.5 million
responseid values
attrib: ~100,000 unique values
data: ~ 500,000 unique values