i could write ddl for this one, but it would probably be pointless. but just in case, imagine a table: create table t1 ( id int not null, name varchar(50) not null, active bit, owner int, column001 varchar(50), ... column150 varchar(50), constraint pk_t1 primary key nonclustered (id), constraint ak_name unique nonclustered (name) ) the data set that will populate this table grows at a rate of ~x^n over time, and is already pretty large. now, there are a lot of queries against this table with a where clause of "active = 1". in fact that probably constitutes 80-90% of the queries hitting this table. and those queries are complicated, usually comparing values to dozens of the many columns in the table. the data set that populates the table with an active value of 1 is much smaller than the data set of the entire table (currently about 12%), and it grows more slowly (at a rate of ~nx). also, the table is already clustered by the "owner" column, in order to make the 10-20% of queries that filter by owner as fast as possible, because while those queries are a small fraction of the total query volume, they are run by the people who pay for the database :) my question is, what is reasonable, moving forward, as the datasets grow very large? ideas include: 1) a query-only table. essentially duplicating all the "active = 1" records into a new child table which would be the target of all queries. all updates would happen to the master table, and would get sent to the child table through triggers or transactional replication or something similar. this would sort of act like a second "cluster" based on active flag, making those queries much quicker, but would add a level of complexity that might go against most design principles for all i know. 2) switch the cluster index. make the table clustered by active column instead of owner, and hope that the owners don't get pissed off that their queries slow down. is option 1 completely insane, or is it a reasonable solution? if it is insane, are any other solutions, or is option 2 it? is there something better than both of these? thanks in advance, jason
Wow, 150 columns.... of varchar Have you normalised the table to its maximum? Could you not have an Active table and an inactive table for what you are trying to do? I don't think duplicating the data is a good idea and it will give you big headaches down the line when you are trying to compare both tables. How about moving all the active records to another table and making sure all the referential links are kept the same as the original table. This way your inactive data will sort of be like archived data table. Have a trigger that moves them across whenever they get changed from active to inactive. You can then cluster the inactive table by owner and the active table by active column if you wish. Just out of curiosity, why do you have 150 varchar colums? Jay Freeman
thanks for the response, Jay i did consider two separate tables without duplicate records (e.g. moving the active records, not copying them), but that raised problems. for example, records move from being active to inactive with some frequency, making the inserts back into the "archive" table, as you called it, much more complicated, in terms of the business rules of whether it is a "new" inactive, or "existing" inactive record. doable, but tricky. also, any queries by owner would have to scan both tables, only one of which is actually clustered by owner. this may or may not slow the queries down, of course, but it seems like it would be extra work. i guess it boils down to try to assess the cost and maintenance of managing a copy of information, versus the actual movement of information. just narrowing it down to that comparison is helpful, so thank you. to your last question, they aren't all varchar, actually. that was a flaw in my simplified representation. in fact they are MOSTLY bit fields. probably about 100 bit fields, 40 varchar fields, 10 int fields. each row is simply describing something with that many characteristics. most of them have default values, but none of them are actually optional, so i can't even separate a category of columns that are usually empty. in fact, i have already separated out 3 columns which used to be text fields into their own tables. i turned them into varchar(8000) fields so that they could be accessed through stored procedures (which were desperately needed), and that of course required being put into a child table to not violate the max record size. hope this is somewhat responsive to your question jason
Don't see what you're looking for? Try a search.
|