Groups | Blog | Home
all groups > sql server (microsoft) > july 2005 >

sql server (microsoft) : appropriate design question



jason
7/22/2005 11:52:11 AM
i could write ddl for this one, but it would probably be pointless. but
just in case, imagine a table:

create table t1 (
id int not null,
name varchar(50) not null,
active bit,
owner int,
column001 varchar(50),
...
column150 varchar(50),
constraint pk_t1 primary key nonclustered (id),
constraint ak_name unique nonclustered (name)
)

the data set that will populate this table grows at a rate of ~x^n over
time, and is already pretty large.

now, there are a lot of queries against this table with a where clause
of "active = 1". in fact that probably constitutes 80-90% of the
queries hitting this table. and those queries are complicated, usually
comparing values to dozens of the many columns in the table.

the data set that populates the table with an active value of 1 is much
smaller than the data set of the entire table (currently about 12%),
and it grows more slowly (at a rate of ~nx).

also, the table is already clustered by the "owner" column, in order to
make the 10-20% of queries that filter by owner as fast as possible,
because while those queries are a small fraction of the total query
volume, they are run by the people who pay for the database :)

my question is, what is reasonable, moving forward, as the datasets
grow very large? ideas include:

1) a query-only table. essentially duplicating all the "active = 1"
records into a new child table which would be the target of all
queries. all updates would happen to the master table, and would get
sent to the child table through triggers or transactional replication
or something similar.

this would sort of act like a second "cluster" based on active flag,
making those queries much quicker, but would add a level of complexity
that might go against most design principles for all i know.

2) switch the cluster index. make the table clustered by active column
instead of owner, and hope that the owners don't get pissed off that
their queries slow down.

is option 1 completely insane, or is it a reasonable solution? if it is
insane, are any other solutions, or is option 2 it? is there something
better than both of these?

thanks in advance,

jason
byronbayer
7/25/2005 2:55:12 AM
Wow, 150 columns.... of varchar

Have you normalised the table to its maximum?

Could you not have an Active table and an inactive table for what you
are trying to do?

I don't think duplicating the data is a good idea and it will give
you big headaches down the line when you are trying to compare both
tables.

How about moving all the active records to another table and making
sure all the referential links are kept the same as the original table.
This way your inactive data will sort of be like archived data table.
Have a trigger that moves them across whenever they get changed from
active to inactive.
You can then cluster the inactive table by owner and the active table
by active column if you wish.

Just out of curiosity, why do you have 150 varchar colums?

Jay Freeman
jason
7/25/2005 6:55:07 AM
thanks for the response, Jay

i did consider two separate tables without duplicate records (e.g.
moving the active records, not copying them), but that raised problems.
for example, records move from being active to inactive with some
frequency, making the inserts back into the "archive" table, as you
called it, much more complicated, in terms of the business rules of
whether it is a "new" inactive, or "existing" inactive record. doable,
but tricky.

also, any queries by owner would have to scan both tables, only one of
which is actually clustered by owner. this may or may not slow the
queries down, of course, but it seems like it would be extra work.

i guess it boils down to try to assess the cost and maintenance of
managing a copy of information, versus the actual movement of
information. just narrowing it down to that comparison is helpful, so
thank you.

to your last question, they aren't all varchar, actually. that was a
flaw in my simplified representation. in fact they are MOSTLY bit
fields. probably about 100 bit fields, 40 varchar fields, 10 int
fields. each row is simply describing something with that many
characteristics. most of them have default values, but none of them are
actually optional, so i can't even separate a category of columns that
are usually empty. in fact, i have already separated out 3 columns
which used to be text fields into their own tables. i turned them into
varchar(8000) fields so that they could be accessed through stored
procedures (which were desperately needed), and that of course required
being put into a child table to not violate the max record size.

hope this is somewhat responsive to your question

jason
AddThis Social Bookmark Button