all groups > sql server programming > september 2004 >
You're in the

sql server programming

group:

GROUP BY HAVING


GROUP BY HAVING Bogus0
9/13/2004 6:49:01 PM
sql server programming:
I'm having trouble understaning a code snippet from Ken Henderson's book The
Guru's Guide to Transact-SQL. The exmple computes a statistical median (see
below). I've used GROUP BY and HAVING lots of times. I usually use a simple
HAVING clause like HAVING COUNT(*) = 1 or something like that. Here I'm a
little confused. I understand that the GROUP by is done first leaving a
result set 2 3 1 5 8. I understand that the HAVING clause happens after the
GROUP BY. I understand how COUNT works with NULL (NULLS don't get added in
ad they don't make the COUNT result in a NULL, they just get ignored). My
question (finally, the question!) is how this works WHEN i.c1 <= d.c1 THEN 1
on the set 2 3 1 5 8. does i.c1 <= d.c1 happen for every value in the set?
Could someone give a detailed explanation of what SQL Server does when it
gets to the HAVING COUNT CASE? Thanks in advance!

SET NOCOUNT ON
DROP TABLE #dist
CREATE TABLE #dist (c1 int)
INSERT #dist VALUES (2)
INSERT #dist VALUES (3)
INSERT #dist VALUES (1)
INSERT #dist VALUES (5)
INSERT #dist VALUES (8)

SELECT d.c1
FROM #dist d CROSS JOIN #dist i
GROUP BY d.c1
HAVING COUNT(CASE WHEN i.c1 <= d.c1 THEN 1 ELSE NULL END) = (COUNT(*)+1)/2


Re: GROUP BY HAVING David Portas
9/14/2004 7:06:46 AM
[quoted text, click to view]

Yes. This is a self-cross join so the c1 <= c1 comparison happens for every
possible pair of C1 values in the Dist table. As you've already said,
COUNT(<expression>) ignores any NULL values of the expression. That
expression can be anything, it doesn't have to be just a column name. In
this instance it's a CASE construct. To see how it works, put the COUNT
statement in the SELECT list and remove the HAVING clause:

SELECT d.c1,COUNT(CASE WHEN i.c1 <= d.c1 THEN 1 ELSE NULL END)
FROM #dist d CROSS JOIN #dist i
GROUP BY d.c1

Result:

c1
----------- -----------
1 1
2 2
3 3
5 4
8 5

You can see that the extra column effectively computes a sequential number
because there is one c1 value less than or equal to 1, two c1 values less
than or equal to 2,... etc. The COUNT(<expression>) function is counting the
NON-NULL results of the CASE expression for each d.c1 value. Lookup CASE in
Books Online if you aren't familiar with CASE expressions. To disect this
query even further you can try this query, which shows what the result of
the JOIN is before GROUP BY:

SELECT i.c1, d.c1, CASE WHEN i.c1 <= d.c1 THEN 1 ELSE NULL END
FROM #dist d CROSS JOIN #dist i

It's difficult to describe but once you've got it you'll get it for all
time!

If you try inserting a duplicate value into Dist you'll notice that the
Median calculation breaks down because of the way the COUNT(CASE...)
expression works with the duplicate values. If you Google in this group for
Median you'll find other suggested solutions for the median function.

Hope this helps.

--
David Portas
SQL Server MVP
--

Re: GROUP BY HAVING Daryl
9/14/2004 4:16:27 PM
I know you would like to know how this works but will it work in all cases?
It apears it can't because the answer for a set of even numbers the answer
is not in the list.

Computation of Median
When there is an odd number of numbers, the median is simply the middle
number. For example, the median of 2, 4, and 7 is 4.

When there is an even number of numbers, the median is the mean of the two
middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is (4+7)/2 =
5.5.


[quoted text, click to view]

Re: GROUP BY HAVING Daryl
9/14/2004 5:29:05 PM
Have a look at this site
http://www.oreilly.com/catalog/transqlcook/chapter/ch08.html will answer yor
question and goes a little further...


[quoted text, click to view]
The

AddThis Social Bookmark Button