all groups > sql server full text search > february 2004 >
You're in the

sql server full text search

group:

7-Up vs 7 Up



7-Up vs 7 Up Qing Liu
2/26/2004 4:01:06 PM
sql server full text search: Hi, I wonder if there is a way to make full text searches insensitive to non-alphanumeric characters such as "-".
I have "7-UP" store in database. But I want users be able to find it with "7-Up" or "7 Up" or "7!Up", etc.
Here is what I have in my noise files:

a b c d e f g h i j k l m n o p q r s t u v w x y
~ ` ! @ # $ % ^ & * ( ) - _ + = { } [ ] | \ : ; " ' ? / <> ,

I've rebuilt and repopulated the catalogs. But still I can only get "7-Up" when I type in "7-Up", while "7 Up" will return no results.

Please help


Re: 7-Up vs 7 Up jt-kane NO[at]SPAM comcast.net
2/26/2004 8:49:17 PM
Qing,
Could you post the full output of SELECT @@version -- including the
OS platform that you have SQL Server installed on as this is an issue
of the OS-supplied wordbreaker. Also, what is your exact query? Are
you using CONTAINS or FREETEXT? Have you tested your search string
with both?

Regards,
John


[quoted text, click to view]
Re: 7-Up vs 7 Up Qing Liu
2/27/2004 11:11:09 AM
John: Thanks for your reply. Here is the version information
Microsoft SQL Server 2000 - 8.00.760 (Intel X86)
Dec 17 2002 14:22:05
Copyright (c) 1988-2003 Microsoft Corporatio
Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 4

The OS is Windows 2000 server.

I tried both CONTAINS and FREETEXT. The query is like this
SELECT Item FROM Items WHERE CONTAINS(Item, 'formsof (INFLECTIONAL, "7-UP")') ---- This will get the row.
SELECT Item FROM Items WHERE CONTAINS(Item, 'formsof (INFLECTIONAL, "7 UP")') ---- This will not.

The reason I use formsof is I want it to automatically take care of single and plural fomrs and different tenses.
It seems the index is thinking of 7-UP as one word. If I search for just 7 or UP, I don't get any return either, which I wish it would.

Re: 7-Up vs 7 Up John Kane
3/1/2004 4:43:50 PM
You're welcome, Qing,
Yes, you are correct - index is thinking of 7-UP as one word - or more
specifically the Windows 2000 Server (Win2K) wordbreaker - infosoft.dll -
indexes the "-" (dash or hyphen) when in contact with your search number &
word (7 and UP) as a single phrase. A work around for this is to drop and
re-create your FT Catalog and use the Neutral "Language for Word Breaker"
for the Item column. However, with the Neutral "Language for Word Breaker",
you will lose the formsof(inflectional) function as the words are "broken"
into tokens based upon the "white space" between words...

However, this is not the case with Windows Server 2003 (Win2003) or Windows
XP (WinXP) as these OS-platforms, ships with a newer (or better, i.e., more
expectant results) wordbreaker - langwrbk.dll which would correctly (or more
expectant results) break 7 and UP into separate tokens. So, in the long run
to get both the correct wordbreaking for you as well as the use of the
formsof(inflectional) function, I'd recommend that you upgrade to Win2003...

Regards,
John



[quoted text, click to view]

Re: 7-Up vs 7 Up Qing Liu
3/2/2004 7:46:06 PM
Thanks a lot John. Sounds like Win 2003 is the way to go.
AddThis Social Bookmark Button