all groups > sql server new users > june 2006 >
You're in the

sql server new users

group:

calculate percentile using t-sql


calculate percentile using t-sql Loane Sharp
6/15/2006 6:49:10 PM
sql server new users:
Hi there
I am using the following statement to calculate percentile ranks of a set of
scores in quite a large table (400k rows). The tempdb file grows
dramatically (to 40GB+) and the query never reaches a conclusion. Surely I'm
doing something wrong?! (This is actually part of a broader calculation ...
This query calculates the number of scores below a particular score, which I
will then divide by the total number of scores in the table to get the
percentile rank.)
Any comments would be gratefully appreciated.
Best regards
Loane

update table
set numbelow =
t.numbelow
from
(
select (select count(*) from table b where b.normscore > 0 and
b.normscore<=a.normscore) 'numbelow'
from table a
) as t

Re: calculate percentile using t-sql Lawrence Garvin (MVP)
6/19/2006 2:26:12 PM
[quoted text, click to view]
select count(*) from table b
where b.normscore > 0 and b.normscore<=a.normscore
) 'numbelow'
[quoted text, click to view]

What version of SQL Server are you using?

With SQL 2000 this is a complex type scenario. We have a nightmare of a
procedure we call to compute 20th and 80th percentiles on a collection of
numbers. I'm still trying to understand the logic of the procedure so that I
can evaluate whether it can be simplified, rewritten, or extracted out as a
function call.


With SQL 2005, using expressions in the TOP clause, this is trival, and
given our pending migration to SQL 2005, I'm not likely to pursue the issues
affecting our SQL 2000 implementation.

Normally, the pth percentile is obtained by first calculating the rank l =
p(n+1)/100, rounded to the nearest integer and then taking the value that
corresponds to that rank. In case of lower and upper quartiles, the ranks
are 0.25*(10+1) = 2.75 Þ 3 and 0.75*(10+1) = 8.25 Þ 8 which corresponds to
125 and 170 resp.

If these ten values were stored in a SQL Server 2005 table, we can now use a
simple TOP clause with expression to select the desired percentiles.

First calculate the rank for the 25th percentile: .25 * (count(*) + 1)

and then round it to the nearest integer: round(.25 * (count(*) + 1)

and store it in @rank25

Then calculate the rank for the 75th percentile: .75 * (count(*) + 1)

and then round it to the nearest integer: round(.75 * (count(*) + 1)

and store it in @rank75

Then, to take the value that correspondents to that rank, we use the TOP
clause with the computed rank of the 25th and 75th percentile in a nested
query:



DECLARE @blood TABLE(

systolic int

)



INSERT INTO @blood

SELECT 120

UNION ALL SELECT 125

UNION ALL SELECT 125

UNION ALL SELECT 145

UNION ALL SELECT 145

UNION ALL SELECT 150

UNION ALL SELECT 150

UNION ALL SELECT 160

UNION ALL SELECT 170

UNION ALL SELECT 175



DECLARE @rowcount int

DECLARE @rank25 tinyint

DECLARE @rank75 tinyint



SET @rowcount = (SELECT COUNT(*) from #blood)

SET @rank25 = ROUND(.25*(@rowcount+1),0)

SET @rank75 = ROUND(.75*(@rowcount+1),0)



SELECT TOP 1 systolic FROM (

SELECT TOP (@rank25) systolic FROM @blood ORDER BY systolic

) b

ORDER BY b.systolic DESC



SELECT TOP 1 systolic FROM (

SELECT TOP (@rank75) systolic FROM @blood ORDER BY systolic

) b

ORDER BY b.systolic DESC



The result from the first query is 125; the result from the second query is
170.




--
Lawrence Garvin, M.S., MVP-Software Distribution
Everything you need for WSUS is at
http://technet2.microsoft.com/windowsserver/en/technologies/featured/wsus/default.mspx
And, eveything else is at
http://wsusinfo.onsitechsolutions.com
....

AddThis Social Bookmark Button