all groups > sql server misc > april 2004 >
You're in the

sql server misc

group:

How do I speed up this query ? (1.7M table entries)



How do I speed up this query ? (1.7M table entries) Alan Silver
4/14/2004 6:32:17 PM
sql server misc: Hello,

I have a table containing position data for all of the postcodes in the
UK. There are around 1.7 million of these. I would like to be able to
query the table and pull out all postcodes within a specified distance
of a given one. The code I have at the moment takes about 20 seconds to
do this, which is not really acceptable for the client application. I
would appreciate any tips on how to improve performance here.

The code used to create the table was ...

create table Postcodes (
Postcode varchar(8) not null default '' unique,
Northing int not null default 0,
Easting int not null default 0
)
go

where Northing and Easting are distances north and east of an origin
(somewhere off the Scilly Islands I think). Sample data for this looks
like ...

Postcode Easting Northing
-------- ----------- -----------
AB101AF 39410 80640
AB101AG 39420 80630
AB101AH 39430 80630
AB101AJ 39410 80640
AB101AL 39430 80650
AB101AN 39430 80650
AB101AP 39430 80640
AB101AQ 39430 80630
AB101AR 39430 80630
AB101AS 39410 80630

I wrote a stored procedure to do the calculation as follows ...

create procedure NearPostcodes
@postcode varchar(8),
@distance int
as
begin
declare @n decimal
declare @e decimal
select @n=Northing, @e=Easting from Postcodes where
Postcode=upper(replace(@postcode,' ',''))
select Postcode, sqrt(((Northing-@n)*(Northing-@n)) +
((Easting-@e)*(Easting-@e))) as Distance
from Postcodes
where sqrt(((Northing-@n)*(Northing-@n)) +
((Easting-@e)*(Easting-@e))) < @distance
order by Distance, Postcode
end
go

but this took ages to compute. I made the postcode field in table into a
primary key, and changed the above sp so that instead of computing a
square root, it did ...

where ((Northing-@n)*(Northing-@n)) + ((Easting-@e)*(Easting-@e)) <
@distance*@distance

This helped a little, but still only got it down to about 19 seconds per
query.

Any ideas on how to improve things further (if possible) ? I am using
SQL Server 7 (no I can't upgrade at the moment) on Windows NT4 Server,
SP6.

I considered setting up a table containing an entry for every pair of
postcodes with a precalculated distance. This should speed things up
somewhat, but given that it would end up with about 1,500,000,000,000
entries, it might be a little hard to handle !! I suspect that a simple
select query on a table that size could be quite slow.

Any suggestions gratefully received. TIA.

--
Alan Silver



-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
Re: How do I speed up this query ? (1.7M table entries) Wangkhar NO[at]SPAM yahoo.com
4/15/2004 2:20:33 AM
Have you considered trying to (dont ask me how!) rewrite the where
clause algebra
so that
[quoted text, click to view]

the northing and the easting become sargs?

If not Maybe try narrowing the search data by taking a subset of the
data where both the northing and the easting are less than @distance,
and then running the intensive query against that?

Hope you find something tho!


[quoted text, click to view]
Re: How do I speed up this query ? (1.7M table entries) Alan Silver
4/15/2004 12:13:24 PM
In article <bb269444.0404150120.15f4a89a@posting.google.com>, WangKhar
<Wangkhar@yahoo.com> writes
[quoted text, click to view]

Sorry, don't understand what you mean. What are sargs ?

[quoted text, click to view]

That's a good idea. I might try that. Thanx

[quoted text, click to view]

--
Alan Silver



-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
Re: How do I speed up this query ? (1.7M table entries) Wangkhar NO[at]SPAM yahoo.com
4/16/2004 2:47:51 AM
[quoted text, click to view]

By Sargs I mean using the columns so you (sqlserver) can search on
them properly. Using the functions on the column part of the where
clause means (as I understand it) the indexes are not used, basically
shafting the performance. At worst case I understand some functions
have the nasty habit of turning the apparently set based solution into
what is effectively a row based solution (maybe someone can comment to
Re: How do I speed up this query ? (1.7M table entries) Alan Silver
4/19/2004 3:34:25 PM
In article <bb269444.0404160147.13e4aeaf@posting.google.com>, WangKhar
<Wangkhar@yahoo.com> writes
[quoted text, click to view]

Hmm, I'm still not clear. What do you mean by "using the columns" ?

Sorry, but I'm just not sure what you meant. Maybe if you could provide
an example of what you mean it would help.

Thanx for the reply.

--
Alan Silver



-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
Re: How do I speed up this query ? (1.7M table entries) Wangkhar NO[at]SPAM yahoo.com
4/20/2004 4:18:24 AM
Sure, sorry not very clear...

If you have an index on a column, then if you select where column = x
then it stands a good chance of using the index.

My understanding is if you select where myfunction(column) = x then it
wont use the index.

Additionally if you select where column * 2 = x' then it will also
disable the index.

So what you do is rearrange the equation so select where column = x/2


If that makes more sense

[quoted text, click to view]
Re: How do I speed up this query ? (1.7M table entries) Wangkhar NO[at]SPAM yahoo.com
4/20/2004 10:07:13 AM
From what I understand you are doing it may make sense to cluster the
northing, easting, and just slap a unique(?) index on postcode. Then
make sure the range of northing,easting is searchable for the index on
it - for at least a major part of the query. The index on the
northing/easting doesnt need to be unique.



[quoted text, click to view]
Re: How do I speed up this query ? (1.7M table entries) Alan Silver
4/20/2004 12:50:41 PM
In article <bb269444.0404200318.3bc21d63@posting.google.com>, WangKhar
<Wangkhar@yahoo.com> writes
[quoted text, click to view]

Wonderful sense !! Thank you very much for explaining it. The problem is
that the index is on the postcode field, not on the northing or easting
field. I'm not sure what kind of index I would put on those fields as
they are non-unique numbers.

Ta ra

[quoted text, click to view]

--
Alan Silver
(anything added below this line is nothing to do with me)


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
Re: How do I speed up this query ? (1.7M table entries) ambradnum NO[at]SPAM hotmail.com
4/20/2004 11:48:50 PM
I notice @n and @e are decimal while Northing and Easting are int. I think
that doing math with different datatypes may be slower than doing math with
the same datatypes - try changing @n and @e to int.

Also, have you timed the two select statements - is the first select slow -
if so, you may be able to speed it up by doing the replace outside of the
select:

select @postcode = upper(replace(@postcode,' ',''))

select @n=Northing, @e=Easting from Postcodes where Postcode=@postcode

My reasoning here is that doing the replace in the where clause may cause
SQL to ignore the postcode index.

Regards
Alan

[quoted text, click to view]
Re: How do I speed up this query ? (1.7M table entries) Wangkhar NO[at]SPAM yahoo.com
4/21/2004 7:34:01 AM
Maybe try

create table Postcodes (
Postcode varchar(8) not null primary key nonclustered,
Northing int not null default 0,
Easting int not null default 0
)
go
create clustered index ix_postcodes on Postcodes(Northing,Easting)


create procedure NearPostcodes
@postcode varchar(8),
@distance int
as
begin
declare @n int
declare @e int
select @n=Northing, @e=Easting
from Postcodes
where Postcode=upper(replace(@postcode,' ',''))

select PostCode, Distance
from
( select Postcode, sqrt(((Northing-@n)*(Northing-@n)) +
((Easting-@e)*(Easting-@e))) as Distance
from Postcodes
where Northing between (@n - @distance) and (@n + @distance)
and Easting between (@e - @distance) and (@e + @distance)
)x
where Distance < @distance
order by Distance, Postcode

end
go

Might be worth a go?



[quoted text, click to view]
Re: How do I speed up this query ? (1.7M table entries) Alan Silver
4/21/2004 11:48:06 AM
In article <3f51d1de.0404202248.32dffad4@posting.google.com>, Alan
<ambradnum@hotmail.com> writes
[quoted text, click to view]

See below. I tried this after the other suggestion, so will comment
second.

[quoted text, click to view]

It occurred to me when reading your post that the replace and upper are
redundant as I converted all the data when I created the table. I did
what you suggested and just converted the parameter once (obvious
really) and that knocked one second off the execution time, from 7 down
to 6.

I then changed the decimals to ints and this brought it down to three
seconds !! Not bad from the 60 seconds of my original version and 20
seconds for my optimised version.

Thanx very much for the suggestions. Please feel free to make any more
that might bring it down even further. At the moment I just have a
unique constraint on the postcode field and no constraints, clusters,
etc on the others. Not sure if anything would help.

Thanx again

[quoted text, click to view]

--
Alan Silver
(anything added below this line is nothing to do with me)


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
Re: How do I speed up this query ? (1.7M table entries) Alan Silver
4/21/2004 1:10:39 PM
In article <bb269444.0404200907.6b2885d4@posting.google.com>, WangKhar
<Wangkhar@yahoo.com> writes
[quoted text, click to view]

Thanx for the suggestion. I tried it, but it actually slowed the query
down. Thanx to all the suggestions I've had here, the query is down to
three seconds, which is acceptable. Obviously I would be happy to reduce
it even further, but I can live with it for the moment.

Thanx again.

[quoted text, click to view]

--
Alan Silver
(anything added below this line is nothing to do with me)


-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
Re: How do I speed up this query ? (1.7M table entries) Wangkhar NO[at]SPAM yahoo.com
4/22/2004 3:01:14 AM
Something else that may eke out a few moments. Consider altering the
postcode to a CHAR(8) instead of varchar. The 1 or 2 characters you
save with the varchar are counterbalanced by the overhead per row -
iirc around 3 or 4 bytes per row in this case. Given as a percentage
of row width its relevant. Just remember the white space is
preserved... so maybe manipulate your @postcode to @postcode+' ' or
(slower - rtrim).

[quoted text, click to view]
Re: How do I speed up this query ? (1.7M table entries) Alan Silver
4/22/2004 4:11:53 PM
In article <bb269444.0404220201.4ce10e87@posting.google.com>, WangKhar
<Wangkhar@yahoo.com> writes
[quoted text, click to view]

OK, thanx. I haven't had time to test the previous suggestion properly
yet, but will give them all a go.

Ta ra

--
Alan Silver
Re: How do I speed up this query ? (1.7M table entries) Aaron W. West
6/18/2004 11:55:00 PM
I was surprised no one mentioned rtree/spatial indexes (in a post).
PostgreSQL, Oracle, and to some extent recent versions of MySQL (4.x) each
have spatial index support using r-tree indexes.

I found this library, but fear it may be hard to integrate with MS SQL:

Spatial Index Library 0.4b
http://freshmeat.net/projects/spatialindexlib/?branch_id=35217&release_id=105467

I thought MS FoxPro was supposed to have some support for distance queries
too, but not sure.

My only other recommendation was to reduce the "resolution" of your search,
if you don't need such accuracy. For example, create a table of 10,000 or
100,000 zip code areas (like US 5-digit zips), and a cross-referencing table
to your 1.7 million row one, and run your query against the zip code area
table instead. I would probably use rectangular areas, and assign all zips
in that area the same area_id. Find the areas in the desired distance, then
find all zips in those areas with a join.

An article I found:
Microsoft SQL Server: Future Plans for Supporting Spatial Data
http://www.directionsmag.com/article.php?article_id=430

I suppose you could install Cygwin and PostgreSQL and write an extended
stored procedure in C to query a Postgresql table.... Ugh! It'd be easier to
migrate to PostgreSQL, probably.

Re: How do I speed up this query ? (1.7M table entries) seeker NO[at]SPAM mailinator.com
6/19/2004 4:13:27 PM
Alan,

A way to improve queries of this type is to consider the fact that if
an x and y are not in the bounding square (x-r, y-r)-(x+r, y+r), then
neither will they be in the circle bounded by radius r (the distance).
Filtering on the bounding square first allows us to use sargable
arguments, and limits the heavy calculation later. See amended proc
below.

I hope this helps you,
Keith, EBSCO
------------------------------------
create procedure NearPostcodes
@postcode varchar(8),
@radius int -- distance
as
begin
-- This method assumes a sufficiently small radius
-- is chosen, so that cost of populating the local
-- table var, is offset by the performance gain
-- of using sargable arguments.
declare @tblWithinSquare TABLE
(
PostCode varchar(8),
Northing int,
Easting int
)
declare @n int
declare @e int
declare @left int -- bounded square extents
declare @right int
declare @top int
declare @bottom int

-- center of square/circle
select
@n=Northing,
@e=Easting
from
Postcodes with (nolock)
where
Postcode = @postcode
-- consider adding a indexed CHECKSUM on the postcode column
-- it improves exact match string lookups
-- since the CBO can filter on a smaller integer index
-- prior to the expensive string compare
-- AND PostcodeCheckSum = CHECKSUM(@postcode)

-- bounding square extents
select
@left = @e - @radius
@right = @e + @radius
@bottom = @n - radius
@top = @n + radius

-- pre-filter post codes within bounding square
insert into @tblWithinSquare
(
Postcode,
Northing,
Easting
)
select
Postcode,
Northing,
Easting
from
Postcodes with (nolock)
where
-- recommend unique index on (Northing, Easting)
-- since two postcodes should always have different coordinates
Northing between @top and @bottom
and Easting between @left and @right

-- Now it is safe to do the heavy calculation on the
-- hopefully much smaller set of postcodes. (keep radius small)
-- Or even better stop now, :) and consider anything in the
-- bounding square as worthy of reporting.
select
Postcode,
(Northing - @n) * (Northing - @n)
+ (Easting - @e) * (Easting - @e)
as Distance
from
@tblWithinSquare
where
(Northing - @n) * (Northing - @n)
+ (Easting - @e) * (Easting - @e)
< (@radius * @radius)
order by
Distance,
Postcode
end
go


First apply the following efficient filter.

[quoted text, click to view]
AddThis Social Bookmark Button