all groups > sql server full text search > january 2004 >
You're in the

sql server full text search

group:

How to disable noise words?


How to disable noise words? Steve Gotan
1/26/2004 9:28:31 PM
sql server full text search:
I need to disable noise words from my full text index is this possible? I
have a 6.6 million row database with a full text index column varchar 128.
The column is used for business names and proper names. Business names
could be A & A Automotive or Your Next House etc... Proper Names could be
Patty Your. All of these name types are in the same column and proper names
could be in reverse order since the data comes from five different legacy
systems with no real keys since both business names and proper names lived
together in these old systems.

The full text catalog has been working great until we found out about the
Patty Your or Your, Patty search with no records found. After I
investigated further I also found the business name problems and realized I
need to return all words or alpha characters. A "like" wild card search has
problems with proper names in mixed up order like with first name first
sometimes and first name last sometimes.

My full text catalog takes approximately 26 hours to populate at 6.6 million
rows.

Can I remove the stoplist from the schema.txt file and have all words full
text indexed?

Re: How to disable noise words? John Kane
1/26/2004 10:44:58 PM
Steve,
Well, you can certainly delete the noise.* files, but I would not recommend
that. What you may want to do is to empty the noise.* file for your specific
language. Still, I'd recommend that you think about this a bit as deleting
or removing all noise words will have an adverse affect on the FTS query
performance against your very large (6.6 million row) FT-enable table...

These noise word files are language specific (noise.enu = US_English) and
are located under \FTDATA\SQLServer\Config where you have SQL Server 2000
installed. You can make the changes to these files using notepad and save
the file after stopping the "Microsoft Search" file and then running a Full
Population...

What you should really think about doing is further research on what are the
*problem* noise letters (A, etc.) and noise words (your, etc.) in your
FT-enable table as well as what are the "typical" or expected search
keywords that your users actually use in their search criteria.

Regards,
John


[quoted text, click to view]

Re: How to disable noise words? Steve Gotan
1/27/2004 11:07:10 AM
John,
Thank you for the good advice. You are always very helpful.
I plan to empty my noise.enu file and run a few tests to see how it impacts
performance.
I have a couple of followup questions:

1. My @@language is US_English do I need to empty any of the other noise.*
files like noise.eng?, noise.dat?
If I empty these other noise.* files will my FT full population build
faster? I am sure it will take up more disk space.

2. When SQL server patches are installed will I have to verify that my
changes to the noise.* files have not been overwritten with noise words?

Thanks for all the help,
- Steve


[quoted text, click to view]

Re: How to disable noise words? John Kane
1/27/2004 12:11:33 PM
You're welcome, Steve,
1. No, not unless your FT-enable table's column "Language for Word Breaker"
for some other language or unless you've set the "default full-text
language" via sp_configure to something other than 1033 (US_English). No,
you only need to empty noise.enu (assuming US_English) and even so, most
likely your Full Population will take longer to complete and result in a
bigger FT Catalog because more non-noise words and letter will then be
included in your FT Catalog....

2. No, to the best of my knowledge and for at least for patches and service
packs, Microsoft does not make changes to these files. However, that may or
may not change if and when you upgrade to the next version of SQL Server,
codenamed Yukon.

Regards,
John



[quoted text, click to view]

AddThis Social Bookmark Button