all groups > sql server full text search > september 2006 >
You're in the

sql server full text search

group:

Strange behavior of CONTAINS function


Re: Strange behavior of CONTAINS function Hilary Cotter
9/5/2006 12:00:00 AM
sql server full text search:
Using the Dutch word breaker I see that PDP-436 will be indexed as a single
word. So what you are seeing is to be expected. the search on PDP* should
return the results you are looking for but *436 will return hits to *436,
436, !436, &436 (in short 436 preceeded by nothing or any non alphanumeric
character) as the wild card character only applies to word endings or
suffixes. So the * will be thrown away by the indexer.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Strange behavior of CONTAINS function Pim75
9/5/2006 3:39:52 AM
Hello,

I have a query that uses the CONTAINS function but I got a strange
result. I have a table with product information on various tv's.

Part of the query is:
WHERE CONTAINS(sb_pvomschrijving, ' "PDP" AND "436" ')

To my opinion this query has to return all records where the field
sb_pvomschrijving contains the words PDP and 436.

For example, I get this record:
Pioneer PLASMA PDP 436 SXE

but not with an '-' in the product description, like:
Pioneer PLASMA PDP-436 SXE

This looks strange to me as also in the second situation, the words PDP
and 436 are in the field. I also tried using wildcards but the even
don't return the expected results.

WHERE CONTAINS(sb_pvomschrijving, ' "PDP*" AND "*436" ')
also doesn't return the expected value with an '-' in it.

Has anyone some experience with this problem?
Best regards,
Pim
Re: Strange behavior of CONTAINS function Pim75
9/5/2006 6:05:12 AM
Hello Hilary,

Thanks for your reply!
Indeed the search for PDP* returns the correct records.

Do you know if it's possible to turn of the Dutch word breaker? Or has
this any negative effects on performance?

regards,
Pim
Re: Strange behavior of CONTAINS function Hilary Cotter
9/5/2006 9:47:10 AM
You can use the neutral one. There will be minimal performance effects by
using the neutral word breaker over the Dutch one. I know for some languages
it is important, like German. I did spend some time in Beverwyck, and my
father was fluent in Africaans, however my knowledge of spoken and written
Dutch is limited to a few words and phrases so I can't really comment on it.
I'll ask Hugo Kornelius, SQL MVP to weigh in on this.

--
Hilary Cotter
Director of Text Mining and Database Strategy
RelevantNOISE.Com - Dedicated to mining blogs for business intelligence.

This posting is my own and doesn't necessarily represent RelevantNoise's
positions, strategies or opinions.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com



[quoted text, click to view]

Re: Strange behavior of CONTAINS function Hugo Kornelis
9/6/2006 12:12:03 AM
[quoted text, click to view]

Beste Pim,

Hillary asked me to have a look at this thread. I don't normally read
this group, since I have no experience with fulltext search - so please
take my answer with a huge pile of salt.

The issue here, if I understand the messages correctly, is that the
hyphen is considered to be part of a word, not a connector between two
seperate words. That is of course necessary to ensure that Dutch words
that include hyphens, such as kop-van-jut, kop-en-schotel, or
kop-hals-rompboerderij can be found. Unfortunately, this also means that
you won't get all the hits you want for the few words that, even under
new spelling rules, are still combined with a hyphen (such as
niet-roker, pianiste-componiste, zwart-Amerikaans, etc). And, as you
noted, hyphens in model numbers etc are also considered as part of a
single word.

If you use the neutral word breaker, you'll find the PDP-436 screen in
your search. But if you ever have to find a kop-van-jut, you'll probably
have to search for CONTAINS (..., '"kop" AND "van" AND "jut"').

Also, check out other uses of the word breaker. As I said, I have no
fulltext experience, but I do know that it also enables you to find
conjugations of words - for example, as I understand it, a search for
FORMSOF (INFLECTIONAL, vallen) should also match 'viel' {for the English
readers, "vallen" is "to fall" (infinitive), and "viel" is "fell" (past
tense, singular)}. If you also need searches like that, *and* if the
word breaker is used for this as well (ask Hillary - he knows), then you
definitely don't want to switch to a different word breaker.

Met vriendelijke groeten,

--
Re: Strange behavior of CONTAINS function Pim75
9/6/2006 12:15:31 AM
Hello Hugo,

Thanks for your reply. Since yesterday I'm using the neutral word
breaker and as far as I can see this works fine for my situation.

People are mainly seeking for elektronics on our site so I think the
neutral one works fine at all. There are no 'koppen-van-jut' in our
database :-)

Again, thanks!
Pim
AddThis Social Bookmark Button