all groups > sql server full text search > february 2007 >
You're in the

sql server full text search

group:

Full Text Search Problem


Full Text Search Problem monu
2/11/2007 11:16:21 PM
sql server full text search: Hi,
I'm new to this group, i've a problem with the full text search
with sql server 2000,
I've been searching for a word (say "the") in the a full text indexed
table for each column and i get the error as :

"Run-time error '-2147217900(80040e14)':
A clause of the query contained only ignored words"

My Full Text Query is:
Select FT_TBL.*, KEY_TBL.Rank from Table1 as FT_TBL INNER JOIN
CONTAINSTABLE(Table1, *, '"the"') as KEY_TBL ON FT_TBL.Field1 =
KEY_TBL.[KEY]

Here, Table1 is Full Text Indexed table and "Field1" is a primary key
of Table1.
Can anyone in this group help me out wid this.
Thnx in advance.

Regards,
Vikrant Nawani.
RE: Full Text Search Problem ML
2/12/2007 1:20:00 AM
This is the way FTS works. Look up "noise words" in Books Online. The noise
file (e.g. noise.dat) lists all the words that are ignored when building
full-text indexes, which means searching for them is not possible, hence the
error (which becomes a warning in SQL Server 2005).

You have two options:

1) Handle it in the client application: prevent users from issuing searches
where only the ignored words have been used. You can use the noise file to
programmatically test each search string;

2) Remove the words from the noise list (leave empty lines): this may
increase the space used by full-text catalogs significantly, so only remove
those words that you expect the users to search for.

Perhaps other frequent posters in this newsgroup have other suggestions.


ML

---
Re: Full Text Search Problem ML
2/12/2007 5:13:01 AM
Yes, noise words aren't all bad, but search strings containing nothing but
noise words are. :)


ML

---
Re: Full Text Search Problem ML
2/12/2007 5:50:00 AM
Thanks for that info - it's essential, a must-know.

Do you by any chance have a list of characters ignored by FTS that aren't
included in noise files (e.g. punctuation marks)?


ML

---
Re: Full Text Search Problem Hilary Cotter
2/12/2007 8:04:00 AM
Just to piggy back off ML's comment.

I used to recommend stripping the noise words out of your query phrase,
however this will frequently lead to errors, for example a search on
"University Of California" when stripped of its noise word OF, and then the
search conducted on "University California" will miss results containing
"University of California" and "University to California".

IMHO the best approach is to empty your noise word list and replace it with
a single space or as ML points out a line feed.

Note that a FreeText search gets around this problem but may return too many
results and its speed is slower than the Contains.

--
Hilary Cotter

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Re: Full Text Search Problem Hilary Cotter
2/12/2007 8:41:54 AM
Hi ML, very true and well said.

Historically Noise words were intended to conserve disk space as back in the
80's when search was first starting disks were very expensive. Today they
are intended to "hide" noisy phrases from searching. For example a search on
Microsoft SQL Server is the functional equivalent of a search on SQL Server.
So you get better search efficiency by not looking for Microsoft.

Microsoft (at one time, perhaps still the case) added Microsoft to their
noise word list on their search engines for this reason. Apparently at one
time they also would add words greater than 26 letters to their noise word
list as you would be unable to search on them.

MSN search was one of the first big search engines to allow you to search on
noise words, for example a search on "the" when MSN Search first came out
would return the number on hit to the white house.

--
Hilary Cotter

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Re: Full Text Search Problem Hilary Cotter
2/12/2007 9:36:26 AM
Basically all alpha-numeric letters are indexed. Hyphens and capitalization
are respect in some languages. In some languages the indexing process knows
a character occurs after a single letter (i.e. C#), but doesn't index what
the character is, i.e. a search on C# will match with C$.

Currency symbols change how a number is stored in the index as well as
apparent date strings.

Abbreviations are handled differently, for example f.b.i is indexed as f, b,
and i, whereas F.B.I is indexed as FBI, and F.B.I.

IMHO I did an ok job in this article discussing language options in SQL FTS.
http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/

If you are really interested in the internals of how this works with most
search engines you might want to look at the code in Lucene or Foundations
of Statistical Natural Language Processing. There is another book which is
really good on this and presents algorithms but I can't recall the name of
it right now.


--
Hilary Cotter

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Re: Full Text Search Problem ML
2/14/2007 1:59:05 AM
Thank you again! That article is now a permanent reference.


ML

---
AddThis Social Bookmark Button