all groups > sql server full text search > august 2004 >
You're in the

sql server full text search

group:

WEIGHT confusion (SQL 2k, running on Win2k3)


Re: WEIGHT confusion (SQL 2k, running on Win2k3) John Kane
8/30/2004 2:14:03 PM
sql server full text search: Dunc,
Could you post the full output of the following SQL script as the info would
be helpful in understanding this...

use <your_database_name_here>
go
SELECT @@language
SELECT @@version
-- Note, may need to set 'show advanced options'
EXEC sp_configure 'default full-text language'
EXEC sp_help_fulltext_catalogs
EXEC sp_help_fulltext_tables
EXEC sp_help_fulltext_columns
EXEC sp_help VenueList
go
SELECT count(*) from VenueList
go

Could you also run profiler and re-execute your below query with the
function BreakSearchStringToIsAbout and post the exact SQL code that it is
generating? You might also want to review SQL Server 2000 BOL title
"Full-text Search Recommendations" and it's last paragraph on RANK.

Thanks,
John




[quoted text, click to view]

Re: WEIGHT confusion (SQL 2k, running on Win2k3) Hilary Cotter
8/30/2004 9:32:31 PM
I'm a little confused. When I try to build use this function I get output
that looks like this:

"the" WEIGHT(10), "*the*" WEIGHT(5), "rain" WEIGHT(10), "*rain*" WEIGHT(5),
"in"
WEIGHT(10), "*in*" WEIGHT(5), "spain" WEIGHT(10), "*spain*" WEIGHT(5),
"stays"
WEIGHT(10), "*stays*" WEIGHT(5), "mainly" WEIGHT(10), "*mainly*" WEIGHT(5),
"in"
WEIGHT(10), "*in*" WEIGHT(5), "the" WEIGHT(10), "*the*" WEIGHT(5), "plain"
WEIGHT(10), "*plain*" WEIGHT(5)

A couple of points:

The beginning * is not treated as a wildcard operator for prefixes, its
ignored, so you can remove this altogether. It is valid for the suffix, so
you might get the same results if you do a FreeText.

Secondly, you are doing a left join which says give me all results on the
left side, whether there is a match on the right side or not, and there is
always a match on the right side, because it looks like you are joining
against the base table.

It might be helpful to try to understand exactly what you are trying to
accomplish, include complete schemas, etc.

--
Hilary Cotter
Looking for a book on SQL Server replication?
http://www.nwsu.com/0974973602.html


[quoted text, click to view]

WEIGHT confusion (SQL 2k, running on Win2k3) Dunc
8/30/2004 9:55:23 PM
I'm dynamically creating a SQL statement that takes each word from a
sentence, and using the WEIGHT command, searches a number of fields twice
for each word - once with a given weight for the exact word, and once with
half the weight for a partial match (by putting asterisks at the start and
end of each word).

Going against all the documentation I've got, items with a lower weight seem
to be coming up higher. After a bit more playing around (I removed the
partial match functionality), the lower the number, the higher the rank
seems to be. I can't just multiply the partial match value, as it'll
potentially go over one.

Either I'm doing something dumb, or I've missed a pretty valuable piece of
documentation. Can someone point me in the right direction here? I've put
a few key snippets of the code below.

Thanks in advance,

Dunc

"FROM VenueList " & _
"LEFT JOIN CONTAINSTABLE(VenueList, VenueName, 'ISABOUT (" &
BreakSearchStringToIsAbout(arrSearchString, 0.5) & ")') AS B " & _
" ON A.VenueID = B.[Key] " & _

---/ snip /---

Function BreakSearchStringToIsAbout(arrSearchItem, iWeight)
Dim iLoop
Dim strOutput

strOutput = ""
For iLoop = 0 to uBound(arrSearchItem)
strOutput = strOutput & """" & arrSearchItem(iLoop) & """" & " WEIGHT(" &
iWeight & "), "
strOutput = strOutput & """*" & arrSearchItem(iLoop) & "*""" & " WEIGHT("
& iWeight / 2 & "), "
Next

strOutput = Left(strOutput, Len(strOutput) - 2)

BreakSearchStringToIsAbout = strOutput
End Function

Re: WEIGHT confusion (SQL 2k, running on Win2k3) John Kane
8/31/2004 1:01:36 PM
You're welcome, Dunc,
Not to worry, mostly, I need to know the SQL Sever version and OS platform
info, our language settings, and row count of your FT-enable table... This
is all basic configuration info necessary to troubleshoot SQL FTS issues...

You're using SQL Server 2000 SP3 Developer Edition on Win2003 with the
default language of US_English on at table with 2,272 rows. Can I assume
that your goal here is to have the search keyword ("late" in your query
below) have a greater weight from one column vs. another?

If so, I see two issues... One, is that you should use only "INNER JOIN" or
just JOIN (defaults to INNER JOIN) vs. LEFT JOIN in your query - see the
Northwind example below. Secondly, your table size of only 2,272 rows may be
too small to get a significant variation of query specific RANK values and
then be *influenced* by the Weight parameter. It would better to re-test
your modified query against a larger table, say with at least 10,000 rows.

-- note, the use of JOIN (defaults to INNER JOIN) only...
use Northwind
go
SELECT e.EmployeeID, e.LastName, ct.[KEY], ct.[RANK]
FROM Employees AS e
JOIN CONTAINSTABLE(Employees, Notes, 'French') AS ct ON e.EmployeeID =
ct.[KEY]
ORDER BY ct.[RANK] DESC

If it is your goal to have the same keyword rank higher or lower in one
column over another columns, I may have another method for achieving this,
but I need to test it out a bit with a variable weight parameter.

Thanks,
John



[quoted text, click to view]
Re: WEIGHT confusion (SQL 2k, running on Win2k3) John Kane
8/31/2004 4:28:16 PM
Dunc,
Ok, what I think you really need is a multiple column join using a single
table example using an OR condition... (BTW, this is the 3rd one I've posted
today, although, not the exact version as below..;). Below is an example of
this...

use Northwind
SELECT distinct e.LastName, e.FirstName, e.Title
from Employees AS e,
containstable(Employees, Notes, 'BTS') as A,
containstable(Employees, Title, 'Representative') as B
where
A.[KEY] = e.EmployeeID or -- NOTE OR condition here
B.[KEY] = e.EmployeeID

Note, that I had to use the distinct keyword on the LastName column to
remove duplicate results. Try altering your query below to the above with
the column specific weights as well as OR between the KEY comparisons. While
I fully understand why you want to do this and that you're trying to develop
a viable solution to a failing of SQL FTS, you should know that this
multiple CONTAINSTABLE query will perform very poorly as each CONTAINSTABLE
clause is a "round trip" to the FT Catalog. So, as your table and therefore
the FT Catalog query grows, this query will get slower and slower...

Regards,
John







[quoted text, click to view]
Re: WEIGHT confusion (SQL 2k, running on Win2k3) Dunc
8/31/2004 8:09:23 PM
I've had to temporarily remove the item* searches, as it was returning too
much rubbish.

I'm a little nervous re: posting some of this info to a newsgroup, so I've
renamed a few columns and table names. This query actually works
*reasonably* well, though you'll notice that only one column has a weight of
anything apart from 1.0 - and that's the name column (which is the most
important). If I raise this up at all, the name seems to drop to a
significantly less relevance, and everything goes pear-shaped.

Thanks in advance,

Dunc

---/ snip /---

language: us_english

version: Microsoft SQL Server 2000 - 8.00.760 (Intel X86) Dec 17 2002
14:22:05 Copyright (c) 1988-2003 Microsoft Corporation Developer Edition
on Windows NT 5.2 (Build 3790: )

default full-text language: default full-text language 0 2147483647 1033
1033

fulltext_catalogs: 5 MyDirectory e:\SQLServer\MSSQL\FTDATA 0 7

fulltext_tables (amongst a few irrelevant tables):
dbo VenueList PK_VenueList 1 1 MyDirectory
dbo VenueExtras PK_VenueExtras 1 1 MyDirectory

fulltext_columns (amongst a few irrelevant columns):
dbo 1856725667 VenueList VenueName 2 NULL NULL 1033
dbo 1856725667 VenueList Addr1 3 NULL NULL 1033
dbo 1856725667 VenueList Addr2 4 NULL NULL 1033
dbo 1856725667 VenueList Addr3 5 NULL NULL 1033
dbo 1856725667 VenueList Addr4 6 NULL NULL 1033
dbo 1856725667 VenueList Addr5 8 NULL NULL 1033
dbo 1856725667 VenueList Postcode 9 NULL NULL 1033
dbo 1856725667 VenueList Style 20 NULL NULL 1033
dbo 1856725667 VenueList Theme 22 NULL NULL 1033
dbo 1856725667 VenueList Description 23 NULL NULL 1033
dbo 1856725667 VenueList NearestTransport 43 NULL NULL 1033
dbo 1856725667 VenueList Review 48 NULL NULL 1033
dbo 1245247491 VenueExtras ExtraName 2 NULL NULL 1033

Venuelist Count: 2272

Full Query from profiler (someone searched on the term "late"):
SELECT * FROM (
SELECT VenueID, VenueName, Addr1, Addr2, Addr3, Addr4, Review,
ISNULL(B.RANK, 0) + ISNULL(C.RANK, 0) + ISNULL(D.RANK, 0) + ISNULL(E.RANK,
0) + ISNULL(F.RANK, 0) + ISNULL(G.RANK, 0) + ISNULL(H.RANK, 0) +
ISNULL(I.RANK, 0) + ISNULL(J.RANK, 0) + ISNULL(K.RANK, 0) +
ISNULL(L.RANK, 0) + ISNULL(M.RANK, 0) + ISNULL(N.RANK, 0) AS Rank
FROM VenueList AS A
LEFT JOIN CONTAINSTABLE(VenueList, Venue, 'ISABOUT ("late" WEIGHT(0.1))') AS
B
ON A.VenueID = B.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Addr1, 'ISABOUT ("late" WEIGHT(1))') AS C
ON A.VenueID = C.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Addr2, 'ISABOUT ("late" WEIGHT(1))') AS D
ON A.VenueID = D.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Addr3, 'ISABOUT ("late" WEIGHT(1))') AS E
ON A.VenueID = E.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Addr4, 'ISABOUT ("late" WEIGHT(1))') AS F
ON A.VenueID = F.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Addr5, 'ISABOUT ("late" WEIGHT(1))') AS G
ON A.VenueID = G.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Postcode, 'ISABOUT ("late" WEIGHT(1))')
AS H
ON A.VenueID = H.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Style, 'ISABOUT ("late" WEIGHT(1))') AS I
ON A.VenueID = I.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Theme, 'ISABOUT ("late" WEIGHT(1))') AS J
ON A.VenueID = J.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Description, 'ISABOUT ("late"
WEIGHT(1))') AS K
ON A.VenueID = K.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, NearestTransport, 'ISABOUT ("late"
WEIGHT(1))') AS L
ON A.VenueID = L.[Key]
LEFT JOIN CONTAINSTABLE(VenueList, Review, 'ISABOUT ("late" WEIGHT(1))') AS
M
ON A.VenueID = M.[Key]
LEFT JOIN CONTAINSTABLE(VenueExtras, ExtraName, 'ISABOUT ("late"
WEIGHT(1))') AS N
ON A.VenueID = N.[Key]
WHERE A.Status > 0
AND (B.RANK IS NOT NULL
OR C.RANK IS NOT NULL
OR D.RANK IS NOT NULL
OR E.RANK IS NOT NULL
OR F.RANK IS NOT NULL
OR G.RANK IS NOT NULL
OR H.RANK IS NOT NULL
OR I.RANK IS NOT NULL
OR I.RANK IS NOT NULL
OR K.RANK IS NOT NULL
OR L.RANK IS NOT NULL
OR M.RANK IS NOT NULL
OR N.RANK IS NOT NULL)
) SubQuery ORDER BY Rank DESC, Venue

[quoted text, click to view]

Re: WEIGHT confusion (SQL 2k, running on Win2k3) Dunc
8/31/2004 10:47:05 PM
Hi John,

I've made the change to the JOIN statements (ex LEFT JOIN), but that hasn't
worked, as they keyword I'm searching for isn't in every field - potentially
only one - so it's never returning any results.

This SQL code is running on a mirror of a site that's live and running at
the moment - it's a bar review website, run by a bunch of hobby'ists (i.e.
it's all done by volunteers - techies, marketing, and a bunch of journos).
As much as I'd like 100k rows, 2.2k is actually a pretty high number for the
data it's storing. When we launch the next phase, there will be more like
40k, but that will be branched down by region to ~5k per region.

I'm really appreciating your help - if you'd like to get a better idea of
what we're trying to achieve, the site is at:
http://www.fluidfoundation.com

You'll see the search on the right hand side of every page, or in the middle
of the front page.

Dunc

[quoted text, click to view]
AddThis Social Bookmark Button