Groups | Blog | Home
all groups > sql server (alternate) > may 2004 >

sql server (alternate) : SELECT DISTINCT with JOIN


Erland Sommarskog
5/13/2004 9:59:37 PM
Craig Hoskin (nospam@infobahn.co.nz) writes:
[quoted text, click to view]

You want to display a distinct list, but you only want one TopCategory.
So which one do you want?

Well, assuming you only want the highest, you could do:

SELECT a.idProduct, a.description,a.descriptionLong,
a.listPrice,a.price,a.smallImageUrl,a.stock, a.fileName,a.noShipCharge,
MAX(c.topcategory)
FROM products a, categories_products b, categories c
WHERE active = -1 AND homePage = -1
AND a.idProduct = b.idProduct
AND c.idcategory=b.idcategory
AND prodType = 1 ORDER BY a.idProduct DESC
GROUP BY a.idProduct, a.description,a.descriptionLong,
a.listPrice,a.price,a.smallImageUrl,a.stock,
a.fileName,a.noShipCharge



--
Erland Sommarskog, SQL Server MVP, sommar@algonet.se

Books Online for SQL Server SP3 at
Hugo Kornelis
5/13/2004 11:27:47 PM
[quoted text, click to view]

Hi Craig,

The first idea will have to come from you. SQL Server won't make your
choices for you. So you must decide *which* TopCategory to display for
products that are in more than one TopCategory.

If you can describe how to choose, I (or someone else in this NG) will
probably be able to help writing up the query.

Best, Hugo
--

Hugo Kornelis
5/13/2004 11:45:23 PM
[quoted text, click to view]

Hi Craig,

Hmmmm. Since SQL Server still won't choose for you, allow me to do it for
you. The following two queries will select the "first" topcategory (I
don't know the datatype, so it will either be the lowest numbered, the one
that comes first in the alphabet or the oldest).

1: Using GROUP BY

SELECT a.idProduct, a.description, a.descriptionLong,
a.listPrice, a.price, a.smallImageUrl, a.stock,
a.fileName, a.noShipCharge, MIN(c.topcategory)
FROM products a, categories_products b, categories c
WHERE a.active = -1
AND a.homePage = -1
AND a.idProduct = b.idProduct
AND c.idcategory = b.idcategory
AND a.prodType = 1
GROUP BY a.idProduct, a.description, a.descriptionLong,
a.listPrice, a.price, a.smallImageUrl, a.stock,
a.fileName, a.noShipCharge
ORDER BY a.idProduct DESC


2: Using a subquery.

SELECT a.idProduct, a.description, a.descriptionLong,
a.listPrice, a.price, a.smallImageUrl, a.stock,
a.fileName, a.noShipCharge,
(SELECT MIN(c.topcategory)
FROM categories_products b, categories c
WHERE a.idProduct = b.idProduct
AND c.idcategory = b.idcategory)
FROM products a
WHERE a.active = -1
AND a.homePage = -1
AND a.prodType = 1
ORDER BY a.idProduct DESC

Both queries are untested. Test them both and check the results. Then
compare the execution plan to see which is most efficient with your data.
If you see little difference, pick the one you like best :-)

Best, Hugo
--

Hugo Kornelis
5/13/2004 11:50:07 PM
[quoted text, click to view]

Hi Craig,

So I noted. But it still was a useful post, as it showed that the active,
homepage and product columns are all in the Products table (something that
was not clear from your first post). Without this knowledge, I could not
have presented my second option (see my other post).

Best, Hugo
--

Hugo Kornelis
5/14/2004 12:48:02 AM
[quoted text, click to view]

(snip)
[quoted text, click to view]

Hi Craig,

Your query seems to contain some redundant joins. This may not hurt you if
the optimizer "sees" that it doesn't need to perform the join twice, but
I'm not sure if the optimizer is intelligent enough for that.

[quoted text, click to view]

Without testing, I'd say mine is quicker (probably) or they have equal
performance (slight chance). But you'd have to test to be sure. Make sure
you have an adequate (ie large) set of test data with as typical
distribution as possible. Run both queries with SET STATISTICS IO. Also,
compare the execution plans.

Best, Hugo
--

Hugo Kornelis
5/14/2004 1:00:27 AM
[quoted text, click to view]

Forgot to mention - my first suggestion (identical to Erland's) has a high
probability of being the fastest of them all (though it does depend on
what indexes are available for the optimizer to toy with).

I suspect the long group by list is what makes this one "look" less good
to you. However, this will probably look better to the optimizer than the
correlated subqueries in my second query and in your post. Also, do keep
in mind that you can mentally replace the entire group by list by "all
columns in the select clause except topcategory".

Best, Hugo
--

Craig Hoskin
5/14/2004 9:05:08 AM
Hi everyone

Have a problem I would areally appreciate help with.
I have 3 tables in a standard format for a Bookshop, eg

Products
Categories
Categories_Products

the latter allowing me to have products in multiple categories.

Everthing works well except for one annoying little thing.

When an individual product (which is in more than one topcategory) is added
to the Shopping Cart it displays twice, because in my select statement I
have the Category listed. I realise I could remove the TopCategory from the
statement and that makes my DISTINCT work as I wanted, but Id prefer to have
the TopCategory as it saves me later having to another SQL query (Im already
doing one to allow me not to list category in the Statement .... but If I
can overcome this one ... then I can remove this as well).

Here is my table structure (the necessary bits)
products
idProduct int
....

categories
idcategory int
idParentCategory int
topcategory int
...

categories_products
idCatProd int
idProduct int
idCategory

When I run a query such as

SELECT DISTINCT a.idProduct, a.description,a.descriptionLong,
a.listPrice,a.price,a.smallImageUrl,a.stock, a.fileName,a.noShipCharge,
c.topcategory
FROM products a, categories_products b, categories c
WHERE active = -1 AND homePage = -1
AND a.idProduct = b.idProduct
AND c.idcategory=b.idcategory
AND prodType = 1 ORDER BY a.idProduct DESC

This will return all products as expected, as well as any products which are
in more than one TopCategory.

Any ideas how to overcome this would be greatly appreciated.

Cheers

Craig

Craig Hoskin
5/14/2004 9:28:05 AM
Doooh

Always the way, you post a question and find a possible solution :-)

Would this work?

select a.idProduct, a.description,a.descriptionLong,
a.listPrice,a.price,a.smallImageUrl,a.stock, a.fileName,a.noShipCharge
from Products a
where exists (select * from categories c, categories_products b where
a.idProduct = b.idProduct AND c.idcategory=b.idcategory)
AND a.active = -1 AND a.homePage = -1 AND a.prodType = 1
ORDER BY a.idProduct DESC

Cheers

Craig

[quoted text, click to view]

Craig Hoskin
5/14/2004 9:30:00 AM

[quoted text, click to view]

Sorry, yes I should have calrified that aspect. Answer: I dont care :-)
Any topcategory will do as its still the same product, and the produut
display page will actually list all the categories its in.

Thanks for taking the time to help me.

Cheers

Craig

[quoted text, click to view]

Craig Hoskin
5/14/2004 9:33:15 AM
No cancel that ... I forgot my c.topcategory in the qeruy.

Back to seeking help again :-)

Craig Hoskin
5/14/2004 9:45:50 AM
[quoted text, click to view]

Eg max(topcategory) or something / somehow?

Craig Hoskin
5/14/2004 10:03:55 AM
Hi Hugo

Ive tried this which seems to work?

SELECT DISTINCT a.idProduct, a.description,a.descriptionLong,
a.listPrice,a.price,a.smallImageUrl,a.stock, a.fileName,a.noShipCharge,
c.topcategory
FROM products a, categories_products b, categories c
WHERE c.topcategory = (SELECT MAX(c.topcategory)
FROM categories_products b, categories c
WHERE a.idProduct = b.idProduct
AND c.idcategory = b.idcategory)
AND active = -1 AND homePage = -1 A
ND a.idProduct = b.idProduct AND c.idcategory=b.idcategory
AND prodType = 1
ORDER BY a.idProduct DESC


[quoted text, click to view]

Craig Hoskin
5/14/2004 10:08:16 AM
Hi Hugo
[quoted text, click to view]

Thanks for that ... it was your previous posting that gave me an idea about
the MAX. Good plan :-)

[quoted text, click to view]

This last one looked the best.

From a cursory glance, which would be better, yours where the SELECT is a
subquery (forgove my terminiology) or mine where it is in the WHERE clause?

Both provide the same results (it appears) and appear OK performace wise ...
just wondering which is better for a larger number of records?

Thanks again for all your help ... truly very much appreciated.

Cheers

Craig

Craig Hoskin
5/14/2004 10:13:06 AM
Hi Erland

Thanks for the reply.

I note you and Hugo both placed the "select Max()" in the SELECT part, as
opposed to one I made where I placed it in the WHERE clause.

Am I correct in assuming that, given you guys and your experience, that
yours will be more efficient ... since you both seem to have choosen the
same approach?

Cheers

Craig

[quoted text, click to view]

Erland Sommarskog
5/14/2004 9:44:05 PM
Craig Hoskin (nospam@infobahn.co.nz) writes:
[quoted text, click to view]

For this case it is probably more effecient. It was also less typing...

--
Erland Sommarskog, SQL Server MVP, sommar@algonet.se

Books Online for SQL Server SP3 at
AddThis Social Bookmark Button