all groups > sql server full text search > october 2005 >
You're in the

sql server full text search

group:

Punctuation marks


Punctuation marks ML
10/21/2005 5:11:03 AM
sql server full text search: Does anyone have a list of all punctuation marks ignored by the full-text
indexing service by default. Noise files (i.e. noise.dat) only explicitly
list the dollar sign ($) and the underscore (_) as noise "words".

And another observation - the Windows implementation of MS Search (compared
to the MS SQL Server implementation) yields different results - try searching
for files with the "|" character in the file name. Ok, it's an illegal
character, but the result is at least 'interesting'.

As far as the rest of the characters ignored in SQL FTS are concerned, they
don't bother Windows search. Has anyone else come across these (or other)
discrepancies?


Re: Punctuation marks ML
10/21/2005 7:40:02 AM
Thank you, very much. Yes, mainly I'm referring to SQL FTS and I'm aware of
the fact tha SQL FTS and Windows Indexing Services two are separate products.

I'm just baffled by the fact that the two implementations of the MSSearch
engines differ in such a way. Any idea why?

Thanks for the list as well.


Re: Punctuation marks Hilary Cotter
10/21/2005 9:48:44 AM
I take it you are only talking about SQL FTS, you mention Indexing Services
and MSSearch in here which are two separate products although SQL FTS uses
the MSSearch engine.

SQL FTS indexes alphanumeric characters. Most other characters are not
indexed but the engine is aware that something existed there. So a search on
AT&T will match with AT&T, AT!T, AT*T, AT$T, and AT T, if A, T, and At are
not in your noise word list.

..,!:; are discarded.

[quoted text, click to view]

Re: Punctuation marks Hilary Cotter
10/8/2007 12:00:00 AM
Can you post the query here which generates such a message?

Basically SQL FTS does not index any non alpha-numeric characters, it does
treat . and - and capitals different for some languages. If you have a
capital letter followed by a non-alphanumeric character the token is indexed
as a unit, i.e. C# is indexed a C something or other, so it will match with
searches on C#, C+, C$, but not C or c.

--
RelevantNoise.com - dedicated to mining blogs for business intelligence.

Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html

Looking for a FAQ on Indexing Services/SQL FTS
http://www.indexserverfaq.com
[quoted text, click to view]

Re: Punctuation marks Srini
10/8/2007 4:19:01 AM


[quoted text, click to view]
Re: Punctuation marks Srini
10/8/2007 4:23:01 AM
Hi when I use a search term that contains the character ! or @ I get a
"Incorrect syntax near !..." error message.
I would like to know the list of such characters which FTS doesnt LIKE..

[quoted text, click to view]
AddThis Social Bookmark Button