sql server misc:
Hello, I have a table containing position data for all of the postcodes in the UK. There are around 1.7 million of these. I would like to be able to query the table and pull out all postcodes within a specified distance of a given one. The code I have at the moment takes about 20 seconds to do this, which is not really acceptable for the client application. I would appreciate any tips on how to improve performance here. The code used to create the table was ... create table Postcodes ( Postcode varchar(8) not null default '' unique, Northing int not null default 0, Easting int not null default 0 ) go where Northing and Easting are distances north and east of an origin (somewhere off the Scilly Islands I think). Sample data for this looks like ... Postcode Easting Northing -------- ----------- ----------- AB101AF 39410 80640 AB101AG 39420 80630 AB101AH 39430 80630 AB101AJ 39410 80640 AB101AL 39430 80650 AB101AN 39430 80650 AB101AP 39430 80640 AB101AQ 39430 80630 AB101AR 39430 80630 AB101AS 39410 80630 I wrote a stored procedure to do the calculation as follows ... create procedure NearPostcodes @postcode varchar(8), @distance int as begin declare @n decimal declare @e decimal select @n=Northing, @e=Easting from Postcodes where Postcode=upper(replace(@postcode,' ','')) select Postcode, sqrt(((Northing-@n)*(Northing-@n)) + ((Easting-@e)*(Easting-@e))) as Distance from Postcodes where sqrt(((Northing-@n)*(Northing-@n)) + ((Easting-@e)*(Easting-@e))) < @distance order by Distance, Postcode end go but this took ages to compute. I made the postcode field in table into a primary key, and changed the above sp so that instead of computing a square root, it did ... where ((Northing-@n)*(Northing-@n)) + ((Easting-@e)*(Easting-@e)) < @distance*@distance This helped a little, but still only got it down to about 19 seconds per query. Any ideas on how to improve things further (if possible) ? I am using SQL Server 7 (no I can't upgrade at the moment) on Windows NT4 Server, SP6. I considered setting up a table containing an entry for every pair of postcodes with a precalculated distance. This should speed things up somewhat, but given that it would end up with about 1,500,000,000,000 entries, it might be a little hard to handle !! I suspect that a simple select query on a table that size could be quite slow. Any suggestions gratefully received. TIA. -- Alan Silver -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
Have you considered trying to (dont ask me how!) rewrite the where clause algebra so that [quoted text, click to view] > where sqrt(((Northing-@n)*(Northing-@n)) + > ((Easting-@e)*(Easting-@e))) < @distance > order by Distance, Postcode
the northing and the easting become sargs? If not Maybe try narrowing the search data by taking a subset of the data where both the northing and the easting are less than @distance, and then running the intensive query against that? Hope you find something tho! [quoted text, click to view] Alan Silver <alan-silver@nospam.thanx> wrote in message news:<1qnWv6AhWXfAFw38@nospamthankyou.spam>... > Hello, > > I have a table containing position data for all of the postcodes in the > UK. There are around 1.7 million of these. I would like to be able to > query the table and pull out all postcodes within a specified distance > of a given one. The code I have at the moment takes about 20 seconds to > do this, which is not really acceptable for the client application. I > would appreciate any tips on how to improve performance here.
In article <bb269444.0404150120.15f4a89a@posting.google.com>, WangKhar <Wangkhar@yahoo.com> writes [quoted text, click to view] >Have you considered trying to (dont ask me how!) rewrite the where >clause algebra >so that >> where sqrt(((Northing-@n)*(Northing-@n)) + >> ((Easting-@e)*(Easting-@e))) < @distance >> order by Distance, Postcode > >the northing and the easting become sargs?
Sorry, don't understand what you mean. What are sargs ? [quoted text, click to view] >If not Maybe try narrowing the search data by taking a subset of the >data where both the northing and the easting are less than @distance, >and then running the intensive query against that?
That's a good idea. I might try that. Thanx [quoted text, click to view] >Hope you find something tho! > > >Alan Silver <alan-silver@nospam.thanx> wrote in message >news:<1qnWv6AhWXfAFw38@nospamthankyou.spam>... >> Hello, >> >> I have a table containing position data for all of the postcodes in the >> UK. There are around 1.7 million of these. I would like to be able to >> query the table and pull out all postcodes within a specified distance >> of a given one. The code I have at the moment takes about 20 seconds to >> do this, which is not really acceptable for the client application. I >> would appreciate any tips on how to improve performance here. >>
-- Alan Silver -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
[quoted text, click to view] Alan Silver <alan-silver@nospam.thanx> wrote in message news:<aLaN35AU5mfAFwBO@nospamthankyou.spam>... > In article <bb269444.0404150120.15f4a89a@posting.google.com>, WangKhar > > > >the northing and the easting become sargs? > > Sorry, don't understand what you mean. What are sargs ?
By Sargs I mean using the columns so you (sqlserver) can search on them properly. Using the functions on the column part of the where clause means (as I understand it) the indexes are not used, basically shafting the performance. At worst case I understand some functions have the nasty habit of turning the apparently set based solution into what is effectively a row based solution (maybe someone can comment to
In article <bb269444.0404160147.13e4aeaf@posting.google.com>, WangKhar <Wangkhar@yahoo.com> writes [quoted text, click to view] >Alan Silver <alan-silver@nospam.thanx> wrote in message >news:<aLaN35AU5mfAFwBO@nospamthankyou.spam>... >> In article <bb269444.0404150120.15f4a89a@posting.google.com>, WangKhar >> > >> >the northing and the easting become sargs? >> >> Sorry, don't understand what you mean. What are sargs ? > >By Sargs I mean using the columns so you (sqlserver) can search on >them properly. Using the functions on the column part of the where >clause means (as I understand it) the indexes are not used, basically >shafting the performance. At worst case I understand some functions >have the nasty habit of turning the apparently set based solution into >what is effectively a row based solution (maybe someone can comment to >clarify that).
Hmm, I'm still not clear. What do you mean by "using the columns" ? Sorry, but I'm just not sure what you meant. Maybe if you could provide an example of what you mean it would help. Thanx for the reply. -- Alan Silver -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
Sure, sorry not very clear... If you have an index on a column, then if you select where column = x then it stands a good chance of using the index. My understanding is if you select where myfunction(column) = x then it wont use the index. Additionally if you select where column * 2 = x' then it will also disable the index. So what you do is rearrange the equation so select where column = x/2 If that makes more sense [quoted text, click to view] Alan Silver <alan-silver@nospam.thanx> wrote in message news:<vCMQV8ExN+gAFw$s@nospamthankyou.spam>... > In article <bb269444.0404160147.13e4aeaf@posting.google.com>, WangKhar > <Wangkhar@yahoo.com> writes > >Alan Silver <alan-silver@nospam.thanx> wrote in message > >news:<aLaN35AU5mfAFwBO@nospamthankyou.spam>... > >> In article <bb269444.0404150120.15f4a89a@posting.google.com>, WangKhar > >> > > >> >the northing and the easting become sargs? > >> > >> Sorry, don't understand what you mean. What are sargs ? > > > >By Sargs I mean using the columns so you (sqlserver) can search on > >them properly. Using the functions on the column part of the where > >clause means (as I understand it) the indexes are not used, basically > >shafting the performance. At worst case I understand some functions > >have the nasty habit of turning the apparently set based solution into > >what is effectively a row based solution (maybe someone can comment to > >clarify that). > > Hmm, I'm still not clear. What do you mean by "using the columns" ? > > Sorry, but I'm just not sure what you meant. Maybe if you could provide > an example of what you mean it would help. >
From what I understand you are doing it may make sense to cluster the northing, easting, and just slap a unique(?) index on postcode. Then make sure the range of northing,easting is searchable for the index on it - for at least a major part of the query. The index on the northing/easting doesnt need to be unique. [quoted text, click to view] Alan Silver <alan-silver@nospam.thanx> wrote in message news:<ks1yOOHR6QhAFw1K@nospamthankyou.spam>... > In article <bb269444.0404200318.3bc21d63@posting.google.com>, WangKhar > <Wangkhar@yahoo.com> writes > >Sure, sorry not very clear... > > > >If you have an index on a column, then if you select where column = x > >then it stands a good chance of using the index. > > > >My understanding is if you select where myfunction(column) = x then it > >wont use the index. > > > >Additionally if you select where column * 2 = x' then it will also > >disable the index. > > > >So what you do is rearrange the equation so select where column = x/2 > > > >If that makes more sense > > Wonderful sense !! Thank you very much for explaining it. The problem is > that the index is on the postcode field, not on the northing or easting > field. I'm not sure what kind of index I would put on those fields as > they are non-unique numbers. > > Ta ra
In article <bb269444.0404200318.3bc21d63@posting.google.com>, WangKhar <Wangkhar@yahoo.com> writes [quoted text, click to view] >Sure, sorry not very clear... > >If you have an index on a column, then if you select where column = x >then it stands a good chance of using the index. > >My understanding is if you select where myfunction(column) = x then it >wont use the index. > >Additionally if you select where column * 2 = x' then it will also >disable the index. > >So what you do is rearrange the equation so select where column = x/2 > >If that makes more sense
Wonderful sense !! Thank you very much for explaining it. The problem is that the index is on the postcode field, not on the northing or easting field. I'm not sure what kind of index I would put on those fields as they are non-unique numbers. Ta ra [quoted text, click to view] >Alan Silver <alan-silver@nospam.thanx> wrote in message >news:<vCMQV8ExN+gAFw$s@nospamthankyou.spam>... >> In article <bb269444.0404160147.13e4aeaf@posting.google.com>, WangKhar >> <Wangkhar@yahoo.com> writes >> >Alan Silver <alan-silver@nospam.thanx> wrote in message >> >news:<aLaN35AU5mfAFwBO@nospamthankyou.spam>... >> >> In article <bb269444.0404150120.15f4a89a@posting.google.com>, WangKhar >> >> > >> >> >the northing and the easting become sargs? >> >> >> >> Sorry, don't understand what you mean. What are sargs ? >> > >> >By Sargs I mean using the columns so you (sqlserver) can search on >> >them properly. Using the functions on the column part of the where >> >clause means (as I understand it) the indexes are not used, basically >> >shafting the performance. At worst case I understand some functions >> >have the nasty habit of turning the apparently set based solution into >> >what is effectively a row based solution (maybe someone can comment to >> >clarify that). >> >> Hmm, I'm still not clear. What do you mean by "using the columns" ? >> >> Sorry, but I'm just not sure what you meant. Maybe if you could provide >> an example of what you mean it would help. >> >> Thanx for the reply.
-- Alan Silver (anything added below this line is nothing to do with me) -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
I notice @n and @e are decimal while Northing and Easting are int. I think that doing math with different datatypes may be slower than doing math with the same datatypes - try changing @n and @e to int. Also, have you timed the two select statements - is the first select slow - if so, you may be able to speed it up by doing the replace outside of the select: select @postcode = upper(replace(@postcode,' ','')) select @n=Northing, @e=Easting from Postcodes where Postcode=@postcode My reasoning here is that doing the replace in the where clause may cause SQL to ignore the postcode index. Regards Alan [quoted text, click to view] Alan Silver <alan-silver@nospam.thanx> wrote in message news:<1qnWv6AhWXfAFw38@nospamthankyou.spam>... > Hello, > > I have a table containing position data for all of the postcodes in the > UK. There are around 1.7 million of these. I would like to be able to > query the table and pull out all postcodes within a specified distance > of a given one. The code I have at the moment takes about 20 seconds to > do this, which is not really acceptable for the client application. I > would appreciate any tips on how to improve performance here. > > The code used to create the table was ... > > create table Postcodes ( > Postcode varchar(8) not null default '' unique, > Northing int not null default 0, > Easting int not null default 0 > ) > go > > where Northing and Easting are distances north and east of an origin > (somewhere off the Scilly Islands I think). Sample data for this looks > like ... > > Postcode Easting Northing > -------- ----------- ----------- > AB101AF 39410 80640 > AB101AG 39420 80630 > AB101AH 39430 80630 > AB101AJ 39410 80640 > AB101AL 39430 80650 > AB101AN 39430 80650 > AB101AP 39430 80640 > AB101AQ 39430 80630 > AB101AR 39430 80630 > AB101AS 39410 80630 > > I wrote a stored procedure to do the calculation as follows ... > > create procedure NearPostcodes > @postcode varchar(8), > @distance int > as > begin > declare @n decimal > declare @e decimal > select @n=Northing, @e=Easting from Postcodes where > Postcode=upper(replace(@postcode,' ','')) > select Postcode, sqrt(((Northing-@n)*(Northing-@n)) + > ((Easting-@e)*(Easting-@e))) as Distance > from Postcodes > where sqrt(((Northing-@n)*(Northing-@n)) + > ((Easting-@e)*(Easting-@e))) < @distance > order by Distance, Postcode > end > go > > but this took ages to compute. I made the postcode field in table into a > primary key, and changed the above sp so that instead of computing a > square root, it did ... > > where ((Northing-@n)*(Northing-@n)) + ((Easting-@e)*(Easting-@e)) < > @distance*@distance > > This helped a little, but still only got it down to about 19 seconds per > query. > > Any ideas on how to improve things further (if possible) ? I am using > SQL Server 7 (no I can't upgrade at the moment) on Windows NT4 Server, > SP6. > > I considered setting up a table containing an entry for every pair of > postcodes with a precalculated distance. This should speed things up > somewhat, but given that it would end up with about 1,500,000,000,000 > entries, it might be a little hard to handle !! I suspect that a simple > select query on a table that size could be quite slow. >
Maybe try create table Postcodes ( Postcode varchar(8) not null primary key nonclustered, Northing int not null default 0, Easting int not null default 0 ) go create clustered index ix_postcodes on Postcodes(Northing,Easting) create procedure NearPostcodes @postcode varchar(8), @distance int as begin declare @n int declare @e int select @n=Northing, @e=Easting from Postcodes where Postcode=upper(replace(@postcode,' ','')) select PostCode, Distance from ( select Postcode, sqrt(((Northing-@n)*(Northing-@n)) + ((Easting-@e)*(Easting-@e))) as Distance from Postcodes where Northing between (@n - @distance) and (@n + @distance) and Easting between (@e - @distance) and (@e + @distance) )x where Distance < @distance order by Distance, Postcode end go Might be worth a go? [quoted text, click to view] Alan Silver <alan-silver@nospam.thanx> wrote in message news:<Io45fBCmFlhAFwAB@nospamthankyou.spam>... > In article <3f51d1de.0404202248.32dffad4@posting.google.com>, Alan > Thanx very much for the suggestions. Please feel free to make any more > that might bring it down even further. At the moment I just have a > unique constraint on the postcode field and no constraints, clusters, > etc on the others. Not sure if anything would help. > > >> The code used to create the table was ... > >> > >> create table Postcodes ( > >> Postcode varchar(8) not null default '' unique, > >> Northing int not null default 0, > >> Easting int not null default 0 > >> ) > >> go > >>> >> create procedure NearPostcodes > >> @postcode varchar(8), > >> @distance int > >> as > >> begin > >> declare @n decimal > >> declare @e decimal > >> select @n=Northing, @e=Easting from Postcodes where > >> Postcode=upper(replace(@postcode,' ','')) > >> select Postcode, sqrt(((Northing-@n)*(Northing-@n)) + > >> ((Easting-@e)*(Easting-@e))) as Distance > >> from Postcodes > >> where sqrt(((Northing-@n)*(Northing-@n)) + > >> ((Easting-@e)*(Easting-@e))) < @distance > >> order by Distance, Postcode > >> end > >> go > >> > >> but this took ages to compute. I made the postcode field in table into a > >> primary key, and changed the above sp so that instead of computing a > >> square root, it did ... > >> > >> where ((Northing-@n)*(Northing-@n)) + ((Easting-@e)*(Easting-@e)) < > >> @distance*@distance > >> > >> This helped a little, but still only got it down to about 19 seconds per
In article <3f51d1de.0404202248.32dffad4@posting.google.com>, Alan <ambradnum@hotmail.com> writes [quoted text, click to view] >I notice @n and @e are decimal while Northing and Easting are int. I think >that doing math with different datatypes may be slower than doing math with >the same datatypes - try changing @n and @e to int.
See below. I tried this after the other suggestion, so will comment second. [quoted text, click to view] >Also, have you timed the two select statements - is the first select slow - >if so, you may be able to speed it up by doing the replace outside of the >select: > >select @postcode = upper(replace(@postcode,' ','')) > >select @n=Northing, @e=Easting from Postcodes where Postcode=@postcode > >My reasoning here is that doing the replace in the where clause may cause >SQL to ignore the postcode index.
It occurred to me when reading your post that the replace and upper are redundant as I converted all the data when I created the table. I did what you suggested and just converted the parameter once (obvious really) and that knocked one second off the execution time, from 7 down to 6. I then changed the decimals to ints and this brought it down to three seconds !! Not bad from the 60 seconds of my original version and 20 seconds for my optimised version. Thanx very much for the suggestions. Please feel free to make any more that might bring it down even further. At the moment I just have a unique constraint on the postcode field and no constraints, clusters, etc on the others. Not sure if anything would help. Thanx again [quoted text, click to view] >Regards >Alan > >Alan Silver <alan-silver@nospam.thanx> wrote in message >news:<1qnWv6AhWXfAFw38@nospamthankyou.spam>... >> Hello, >> >> I have a table containing position data for all of the postcodes in the >> UK. There are around 1.7 million of these. I would like to be able to >> query the table and pull out all postcodes within a specified distance >> of a given one. The code I have at the moment takes about 20 seconds to >> do this, which is not really acceptable for the client application. I >> would appreciate any tips on how to improve performance here. >> >> The code used to create the table was ... >> >> create table Postcodes ( >> Postcode varchar(8) not null default '' unique, >> Northing int not null default 0, >> Easting int not null default 0 >> ) >> go >> >> where Northing and Easting are distances north and east of an origin >> (somewhere off the Scilly Islands I think). Sample data for this looks >> like ... >> >> Postcode Easting Northing >> -------- ----------- ----------- >> AB101AF 39410 80640 >> AB101AG 39420 80630 >> AB101AH 39430 80630 >> AB101AJ 39410 80640 >> AB101AL 39430 80650 >> AB101AN 39430 80650 >> AB101AP 39430 80640 >> AB101AQ 39430 80630 >> AB101AR 39430 80630 >> AB101AS 39410 80630 >> >> I wrote a stored procedure to do the calculation as follows ... >> >> create procedure NearPostcodes >> @postcode varchar(8), >> @distance int >> as >> begin >> declare @n decimal >> declare @e decimal >> select @n=Northing, @e=Easting from Postcodes where >> Postcode=upper(replace(@postcode,' ','')) >> select Postcode, sqrt(((Northing-@n)*(Northing-@n)) + >> ((Easting-@e)*(Easting-@e))) as Distance >> from Postcodes >> where sqrt(((Northing-@n)*(Northing-@n)) + >> ((Easting-@e)*(Easting-@e))) < @distance >> order by Distance, Postcode >> end >> go >> >> but this took ages to compute. I made the postcode field in table into a >> primary key, and changed the above sp so that instead of computing a >> square root, it did ... >> >> where ((Northing-@n)*(Northing-@n)) + ((Easting-@e)*(Easting-@e)) < >> @distance*@distance >> >> This helped a little, but still only got it down to about 19 seconds per >> query. >> >> Any ideas on how to improve things further (if possible) ? I am using >> SQL Server 7 (no I can't upgrade at the moment) on Windows NT4 Server, >> SP6. >> >> I considered setting up a table containing an entry for every pair of >> postcodes with a precalculated distance. This should speed things up >> somewhat, but given that it would end up with about 1,500,000,000,000 >> entries, it might be a little hard to handle !! I suspect that a simple >> select query on a table that size could be quite slow. >> >> Any suggestions gratefully received. TIA.
-- Alan Silver (anything added below this line is nothing to do with me) -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
In article <bb269444.0404200907.6b2885d4@posting.google.com>, WangKhar <Wangkhar@yahoo.com> writes [quoted text, click to view] >From what I understand you are doing it may make sense to cluster the >northing, easting, and just slap a unique(?) index on postcode. Then >make sure the range of northing,easting is searchable for the index on >it - for at least a major part of the query. The index on the >northing/easting doesnt need to be unique.
Thanx for the suggestion. I tried it, but it actually slowed the query down. Thanx to all the suggestions I've had here, the query is down to three seconds, which is acceptable. Obviously I would be happy to reduce it even further, but I can live with it for the moment. Thanx again. [quoted text, click to view] >Alan Silver <alan-silver@nospam.thanx> wrote in message >news:<ks1yOOHR6QhAFw1K@nospamthankyou.spam>... >> In article <bb269444.0404200318.3bc21d63@posting.google.com>, WangKhar >> <Wangkhar@yahoo.com> writes >> >Sure, sorry not very clear... >> > >> >If you have an index on a column, then if you select where column = x >> >then it stands a good chance of using the index. >> > >> >My understanding is if you select where myfunction(column) = x then it >> >wont use the index. >> > >> >Additionally if you select where column * 2 = x' then it will also >> >disable the index. >> > >> >So what you do is rearrange the equation so select where column = x/2 >> > >> >If that makes more sense >> >> Wonderful sense !! Thank you very much for explaining it. The problem is >> that the index is on the postcode field, not on the northing or easting >> field. I'm not sure what kind of index I would put on those fields as >> they are non-unique numbers. >> >> Ta ra >>
-- Alan Silver (anything added below this line is nothing to do with me) -----= Posted via Newsfeeds.Com, Uncensored Usenet News =----- http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
Something else that may eke out a few moments. Consider altering the postcode to a CHAR(8) instead of varchar. The 1 or 2 characters you save with the varchar are counterbalanced by the overhead per row - iirc around 3 or 4 bytes per row in this case. Given as a percentage of row width its relevant. Just remember the white space is preserved... so maybe manipulate your @postcode to @postcode+' ' or (slower - rtrim). [quoted text, click to view] Wangkhar@yahoo.com (WangKhar) wrote in message news:<bb269444.0404210634.6b9c1ca3@posting.google.com>... > Maybe try > > create table Postcodes ( > Postcode varchar(8) not null primary key nonclustered, > Northing int not null default 0, > Easting int not null default 0 > ) > go > create clustered index ix_postcodes on Postcodes(Northing,Easting) > > > create procedure NearPostcodes > @postcode varchar(8), > @distance int > as > begin > declare @n int > declare @e int > select @n=Northing, @e=Easting > from Postcodes > where Postcode=upper(replace(@postcode,' ','')) > > select PostCode, Distance > from > ( select Postcode, sqrt(((Northing-@n)*(Northing-@n)) + > ((Easting-@e)*(Easting-@e))) as Distance > from Postcodes > where Northing between (@n - @distance) and (@n + @distance) > and Easting between (@e - @distance) and (@e + @distance) > )x > where Distance < @distance > order by Distance, Postcode > > end > go > > Might be worth a go? > > > > Alan Silver <alan-silver@nospam.thanx> wrote in message news:<Io45fBCmFlhAFwAB@nospamthankyou.spam>... > > In article <3f51d1de.0404202248.32dffad4@posting.google.com>, Alan > > Thanx very much for the suggestions. Please feel free to make any more > > that might bring it down even further. At the moment I just have a > > unique constraint on the postcode field and no constraints, clusters, > > etc on the others. Not sure if anything would help. > > > >> The code used to create the table was ... > > >> > > >> create table Postcodes ( > > >> Postcode varchar(8) not null default '' unique, > > >> Northing int not null default 0, > > >> Easting int not null default 0 > > >> ) > > >> go > > >>> >> create procedure NearPostcodes > > >> @postcode varchar(8), > > >> @distance int > > >> as > > >> begin > > >> declare @n decimal > > >> declare @e decimal > > >> select @n=Northing, @e=Easting from Postcodes where > > >> Postcode=upper(replace(@postcode,' ','')) > > >> select Postcode, sqrt(((Northing-@n)*(Northing-@n)) + > > >> ((Easting-@e)*(Easting-@e))) as Distance > > >> from Postcodes > > >> where sqrt(((Northing-@n)*(Northing-@n)) + > > >> ((Easting-@e)*(Easting-@e))) < @distance > > >> order by Distance, Postcode > > >> end > > >> go > > >> > > >> but this took ages to compute. I made the postcode field in table into a > > >> primary key, and changed the above sp so that instead of computing a > > >> square root, it did ... > > >> > > >> where ((Northing-@n)*(Northing-@n)) + ((Easting-@e)*(Easting-@e)) < > > >> @distance*@distance > > >> > > >> This helped a little, but still only got it down to about 19 seconds per
In article <bb269444.0404220201.4ce10e87@posting.google.com>, WangKhar <Wangkhar@yahoo.com> writes [quoted text, click to view] >Something else that may eke out a few moments. Consider altering the >postcode to a CHAR(8) instead of varchar. The 1 or 2 characters you >save with the varchar are counterbalanced by the overhead per row - >iirc around 3 or 4 bytes per row in this case. Given as a percentage >of row width its relevant. Just remember the white space is >preserved... so maybe manipulate your @postcode to @postcode+' ' or >(slower - rtrim).
OK, thanx. I haven't had time to test the previous suggestion properly yet, but will give them all a go. Ta ra -- Alan Silver
I was surprised no one mentioned rtree/spatial indexes (in a post). PostgreSQL, Oracle, and to some extent recent versions of MySQL (4.x) each have spatial index support using r-tree indexes. I found this library, but fear it may be hard to integrate with MS SQL: Spatial Index Library 0.4b http://freshmeat.net/projects/spatialindexlib/?branch_id=35217&release_id=105467 I thought MS FoxPro was supposed to have some support for distance queries too, but not sure. My only other recommendation was to reduce the "resolution" of your search, if you don't need such accuracy. For example, create a table of 10,000 or 100,000 zip code areas (like US 5-digit zips), and a cross-referencing table to your 1.7 million row one, and run your query against the zip code area table instead. I would probably use rectangular areas, and assign all zips in that area the same area_id. Find the areas in the desired distance, then find all zips in those areas with a join. An article I found: Microsoft SQL Server: Future Plans for Supporting Spatial Data http://www.directionsmag.com/article.php?article_id=430 I suppose you could install Cygwin and PostgreSQL and write an extended stored procedure in C to query a Postgresql table.... Ugh! It'd be easier to migrate to PostgreSQL, probably.
Alan, A way to improve queries of this type is to consider the fact that if an x and y are not in the bounding square (x-r, y-r)-(x+r, y+r), then neither will they be in the circle bounded by radius r (the distance). Filtering on the bounding square first allows us to use sargable arguments, and limits the heavy calculation later. See amended proc below. I hope this helps you, Keith, EBSCO ------------------------------------ create procedure NearPostcodes @postcode varchar(8), @radius int -- distance as begin -- This method assumes a sufficiently small radius -- is chosen, so that cost of populating the local -- table var, is offset by the performance gain -- of using sargable arguments. declare @tblWithinSquare TABLE ( PostCode varchar(8), Northing int, Easting int ) declare @n int declare @e int declare @left int -- bounded square extents declare @right int declare @top int declare @bottom int -- center of square/circle select @n=Northing, @e=Easting from Postcodes with (nolock) where Postcode = @postcode -- consider adding a indexed CHECKSUM on the postcode column -- it improves exact match string lookups -- since the CBO can filter on a smaller integer index -- prior to the expensive string compare -- AND PostcodeCheckSum = CHECKSUM(@postcode) -- bounding square extents select @left = @e - @radius @right = @e + @radius @bottom = @n - radius @top = @n + radius -- pre-filter post codes within bounding square insert into @tblWithinSquare ( Postcode, Northing, Easting ) select Postcode, Northing, Easting from Postcodes with (nolock) where -- recommend unique index on (Northing, Easting) -- since two postcodes should always have different coordinates Northing between @top and @bottom and Easting between @left and @right -- Now it is safe to do the heavy calculation on the -- hopefully much smaller set of postcodes. (keep radius small) -- Or even better stop now, :) and consider anything in the -- bounding square as worthy of reporting. select Postcode, (Northing - @n) * (Northing - @n) + (Easting - @e) * (Easting - @e) as Distance from @tblWithinSquare where (Northing - @n) * (Northing - @n) + (Easting - @e) * (Easting - @e) < (@radius * @radius) order by Distance, Postcode end go First apply the following efficient filter. [quoted text, click to view] "Aaron W. West" <tallpeak@hotmail.NO.SPAM> wrote in message news:<ccucnVokeYzVfk7d38DK-g@speakeasy.net>... > I was surprised no one mentioned rtree/spatial indexes (in a post). > PostgreSQL, Oracle, and to some extent recent versions of MySQL (4.x) each > have spatial index support using r-tree indexes. > > I found this library, but fear it may be hard to integrate with MS SQL: > > Spatial Index Library 0.4b > http://freshmeat.net/projects/spatialindexlib/?branch_id=35217&release_id=105467 > > I thought MS FoxPro was supposed to have some support for distance queries > too, but not sure. > > My only other recommendation was to reduce the "resolution" of your search, > if you don't need such accuracy. For example, create a table of 10,000 or > 100,000 zip code areas (like US 5-digit zips), and a cross-referencing table > to your 1.7 million row one, and run your query against the zip code area > table instead. I would probably use rectangular areas, and assign all zips > in that area the same area_id. Find the areas in the desired distance, then > find all zips in those areas with a join. > > An article I found: > Microsoft SQL Server: Future Plans for Supporting Spatial Data > http://www.directionsmag.com/article.php?article_id=430 > > I suppose you could install Cygwin and PostgreSQL and write an extended > stored procedure in C to query a Postgresql table.... Ugh! It'd be easier to
Don't see what you're looking for? Try a search.
|