all groups > sql server full text search > august 2004 >
You're in the

sql server full text search

group:

Newbie rank order confusion


Newbie rank order confusion dotnw NO[at]SPAM hotmail.com
8/27/2004 4:38:20 AM
sql server full text search: I'm concerned about the ranked order that search results are returned
to me from a stored procedure using full-text searching.

These are the values inside my table:

Apple
Apple 1
Apple 2
Apple 3
Apple 4
Apple 5
Apple 10

If I do a full-text search for "Apple", and if my stored procedure
uses "top 10" as in "select top 10 *", then I get these results:

Apple 1
Apple 2
Apple 3
Apple 4
Apple 10
Apple 5
Apple

My preferred search result is actually last! (My stored procedure
uses order by rank desc) Also, every result returned has the same
rank value of 4.

If I remove the "top 10" and just use "select *", I get this returned:

Apple
Apple 5
Apple 10
Apple 4
Apple 3
Apple 2
Apple 1

This result seems better, because it has put the "Apple" result at the
top, which exactly matches my search. Again, every result has the
same rank value of 4.

My stored procedure is:

CREATE PROCEDURE spStore_SearchItems
(
@sSearchText varchar(255)
)
AS

select *
from Store_BasicSearchableItems,
freetexttable(Store_BasicSearchableItems, ItemName, @sSearchText)
tblSearchResults
where [key] = Store_BasicSearchableItems.ItemId
order by rank desc


Can anyone offer an explanation for these results please? Is there
anything that I can do to improve this situation?

Re: Newbie rank order confusion Hilary Cotter
8/27/2004 8:56:53 AM
The ordering for identically ranked columns is normally based on when the
column was indexed.

I have observered similar behavior to what you are describing.

You might want to try something like this

select * from containstable(TableName,*,'apple',10) order by rank, [key]


--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html


[quoted text, click to view]

Re: Newbie rank order confusion Dot Net Work
8/27/2004 9:32:33 AM
Thanks a lot.
-dnw.


*** Sent via Developersdex http://www.developersdex.com ***
Re: Newbie rank order confusion John Kane
8/30/2004 2:27:56 PM
Dot Net Work,
While Hilary's reply that he's seen "... similar behavior..." to what you've
observed, I don't believe that the all of the fact are in as to what is the
cause for your results... Could you post the output of the following SQL
code?

use <your_database_name_here>
go
SELECT @@version
SELECT count(*) from Store_BasicSearchableItems
go

I'm assuming that you're using US_English as the "Language for Word
Breaker", but did you also remove the single digits 1 thru 5 (or all single
digits) from the noise word file: noise.enu? If so, did you run a Full
Population afterwards? If not and you are passing in double quotes around
your search word - "apple" - then the single digit numbers (1 thru 5) are
ignored and that could be why you are seeing the same RANK value (4) for all
rows. Also, the number of rows has to be statistically large enough to get a
significantly RANK value returned and if you have less that 10,000 rows in
your table Store_BasicSearchableItems this could also explain the low RANK
values.

Finally, Hilary recommend that you use the following sql query as a
workaround to what he thought to be the problem:

select * from containstable(TableName,*,'apple',10) order by rank, [key]

This query in fact uses a limit parameter - Top_N_Rank (the value 10 in the
above query) - that may not be appropriate for you depending upon the above
information requested. Specifically, you should review KB article 240833
(Q240833) "FIX: Full-Text Search Performance Improved via Support for TOP"
at http://support.microsoft.com//default.aspx?scid=kb;EN-US;240833

Regards,
John





[quoted text, click to view]

Re: Newbie rank order confusion dotnw NO[at]SPAM hotmail.com
8/31/2004 10:29:17 AM
Hi John,

[quoted text, click to view]

Sure...

Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002
14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Developer
Edition on Windows NT 5.0 (Build 2195: Service Pack 4)

16

[quoted text, click to view]
Breaker"

I believe it was slightly different - UK_English

[quoted text, click to view]
digits) from the noise word file: noise.enu?

I have now done this - thanks.

[quoted text, click to view]

I have now run this on my full-text tables.

[quoted text, click to view]
Store_BasicSearchableItems this could also explain the low RANK
values.

I see! Actually, I only have 16 rows as I am just learning about
full-text searching. My final table will only have around 500 rows,
so this information worries me a little.

[quoted text, click to view]

This seems very interesting, but unfortunately I can't get their SQL
example to work. Query analyzer doesn't like this line -

WHERE T.key=CT.key AND T.a > 5

If I could use their example code, I code modify my stored procedure
and get better results.

Using my tweaked original SQL:

select *
from Store_BasicSearchableItems,
containstable(Store_BasicSearchableItems, ItemName, '"Apple 5"', 10)
tblSearchResults
where [key] = Store_BasicSearchableItems.ItemId
order by rank desc

for the search of "Apple 5", I get these results in this order:

Apple, Apple 10, Apple 5, Apple 4, etc.

(Again, all results have the same rank value, this time 32)

TIA,
-dnw.


[quoted text, click to view]
Re: Newbie rank order confusion John Kane
8/31/2004 12:43:10 PM
You're welcome, DNW,
You're using SQL Sever 2000 SP3 Developer Edition on Win2K SP4 using
UK_English, and since you're using UK_English, you should edit the noise.eng
(and NOT the US_English noise word file noise.enu) and then re-run a Full
Population.

16 rows is certainly not large enough of a table to get variations in the
query specific RANK values, but you might be ok with 500 rows, but even so,
I'd recommend that you artificially scale up to 500 rows and re-test your
FTS queries with a larger table...

Below is a short & simple explanation of how RANK is calculated:
1. The number of times a word appears in a document.
2. The closer the searched for words are to each other, the higher the rank,
until the point that they are adjacent becoming a phrase and raising the
rank even higher.
3. The ranking mechanism is weighted so that the more highly inflected the
word is from the version asked for originally, the lower its rank in the
result set. For example, "swim" would be closer to "swims" and further from
"swimmer" because "swim" and "swimmer" are less related grammatically. In
other words, the plural noun form is more related grammatically than the
past-tense verb form of the same word. When resolving queries, the
linguistic engine and ranking algorithm take these linguistic features into
account.

As your CONTAINSTABLE query table is very small (even with 500 rows), I'd
recommend that you remove the Top_N_Rank parameter, i.e., the value 10 in
your query as this is mainly implemented for performance reasons against
very large FT-enable tables.

Additionally, you're searching on "Apple 5" (within double quotes) so this
qualifies as a "phrase search", but the single digit 5 is ignored as it is
still in the noise.eng (UK_English noise word file), so Apple without any
single digits will list first, even though all rows returned have the same
query specific rank value of 32.

I've recommend that you do the following: 1) remove all single digits from
noise.eng, 2) run a Full Population, 3) remove the Top_N_Rank value of 10
from your containstable query, and if possible 4) re-test with a larger
table.

Regards,
John




[quoted text, click to view]

Re: Newbie rank order confusion Dot Net Work
8/31/2004 2:32:20 PM
Hi John,

I have completed all of your suggestions, and I have some interesting
results.

As a summary, my table now looks like this:

Apple 1
Apple 2
etc, up to
Apple 1000
Apple <-- I added this record at the end.

I ran 4 variations of my stored procedure.

Variation 1.

select *
from Store_BasicSearchableItems,
containstable(Store_BasicSearchableItems, ItemName, 'Apple')
tblSearchResults
where [key] = Store_BasicSearchableItems.ItemId
order by rank desc

Produced these results inside Query Analyzer:

Apple 1
Apple 2
etc, up to
Apple 1000
Apple

Apple appears at the end, and all rows have a rank of 96.

Variation 2:

select top 10 *
<the remainder of the sp is the same>

Variation 2 results:

Apple 904
Apple 95
Apple 641
etc -- 10 rows listed

There is no "Apple" record! All rows are ranked 96.

Variation 3:

select *
from Store_BasicSearchableItems,
containstable(Store_BasicSearchableItems, ItemName, @sSearchText, 10)
tblSearchResults
where [key] = Store_BasicSearchableItems.ItemId
order by rank desc

(I removed the top 10, but added in 10 for the top_n_rank)

Variation 3 results:

Similar to variation 2 results. Again no "Apple" record. Rank values
are all 96.

Variation 4:

select top 10 *
<the remainder is the same as the sp for variation 3 right above>

Variation 4 results:

Similar to variation 3 results. No "Apple" record, and rank values all
96.

In all 4 tests, a search for "Apple" only yields "Apple" in 1 test, and
that appears last in the results list.

Does this behaviour strike you as being odd? I must be doing something
wrong?!

TIA,
-dnw.



*** Sent via Developersdex http://www.developersdex.com ***
AddThis Social Bookmark Button