Groups | Blog | Home
all groups > sql server (alternate) > february 2004 >

sql server (alternate) : many DISTINCT queries


louisducnguyen NO[at]SPAM hotmail.com
2/6/2004 12:16:51 PM
[quoted text, click to view]
Try experimenting with group by instead of distinct. Also in query
analyzer, enable view execution plan, to get an idea what mssql is
Florian
2/6/2004 3:16:10 PM
Hi,

I have a table that contains log data, usually around a million records. The
table has about 10 columns with various attributes of the logged data,
nothing special. We're using SQL Server 2000.

Some of the columns (for example "category") have duplicate values
throughout the records. We have a web page that queries the table to show
all the unique columns, for example:

select distinct CATEGORY from table TEST

Obviously the server has to scan all rows in order to get all unique columns
which takes quite a while, especially since that web page contains several
of these types of queries. We also have a MAX(DATE) and MIN(DATE) query that
also add to the load.

I already created indexes on the CATEGORY (actually on all categories)
column which might help a little but I'm pretty sure that there has got to
be a better way.

I also create a view (select distinct CATEGORY from table TEST) and tried to
index it, but it won't let me index a query that contains a DISTINCT
statement.

Isn't there a way to create an index that contains only the distinct values?
Is there another way to speed this up?


Thanks for any hints!

Simon Hayes
2/6/2004 7:08:55 PM

[quoted text, click to view]

If only you load/update the data relatively infrequently, then you could
create a 'lookup' table for each attribute, and populate them from the main
table after loading it (rather like the dimensions in a star schema):

insert into dbo.Categories (Category)
select distinct Category
from dbo.Test

Your client code could then query the lookup tables instead of the log
table. If you want to use indexed views, then you could create a view like
this:

create view dbo.Categories
with schemabinding
as
select Category, count_big(*) as 'Occurrences'
from dbo.Test
group by Category

That should be indexable, although there are quite a few other restrictions,
so you would need to check them out. But having multiple indexed views on
the table would make data modifications much slower, so if the data changes
frequently you might have to use some sort of lookup table approach anyway.

Simon

Florian
2/6/2004 9:43:40 PM
[quoted text, click to view]
anyway.

Thanks, I tried the indexed view and that seems to work OK now, pretty
fast - can't complain. Data shouldn't be updated that often so that should
be OK. Otherwise I might have to go with a lookup table - but it's not a
real good solution for our scenario for reasons I'm not going to bore
anybody with :)

Thanks!

Florian
2/6/2004 9:44:26 PM
[quoted text, click to view]

Yes, the group thing worked great - I also analyzed the execution plan and
now it's only returning the actual number of rows I'm getting - not the
whole table anymore.

Thanks.

AddThis Social Bookmark Button