Hello, I have started to test fulltext indexing. I have run into a problem that I don't know how to solve. I have this data in one fulltext-column "Landsorganisationen i Sverige". When I perform this search I get 1 hit: select top 100 * from CA_CATALOG_New_SEARCH where CONTAINS(FWD_ALL,'"landsorganisationen"') AND CONTAINS(FWD_ALL,'"sverige"') But, when I do this search I get 0 hits (should be 1 hit): select top 100 * from CA_CATALOG_New_SEARCH where CONTAINS(FWD_ALL,'"landsorganisationen"') AND CONTAINS(FWD_ALL,'"sverige"') AND CONTAINS(FWD_ALL,'"i"') There is something with the word "i" that confuses the search engine. I suspect that it is considered a special word and not indexed. But since I want a hit, it seems like I need to know all words that won't work and exclude them from searches. But it sounds like unnessecery work. It must be some better way to work around this. And if I need to know which words, were is the list? Sincerely,
The usual practice is to remove (i.e. replace with a single space) individual letters from the noise file used when the FTI is built. Look up "noise words" in Books Online and make sure you edit the noise file(s) for your specific language(s). FTI needs to be rebuilt after the noise file has been edited. ML ---
If you do empty your noise word list ensure that you replace it with a single space. Otherwise it will use the indexing services noise word list. -- Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com "Thorsten Jakobsson" <Thorsten Jakobsson@discussions.microsoft.com> wrote in message news:B0F51364-BDD8-4F46-88B2-CF55E3A20357@microsoft.com... [quoted text, click to view] > Hello, > I have started to test fulltext indexing. I have run into a problem that I > don't know how to solve. > I have this data in one fulltext-column "Landsorganisationen i Sverige". > When I perform this search I get 1 hit: > select top 100 * from CA_CATALOG_New_SEARCH where > CONTAINS(FWD_ALL,'"landsorganisationen"') AND > CONTAINS(FWD_ALL,'"sverige"') > > But, when I do this search I get 0 hits (should be 1 hit): > select top 100 * from CA_CATALOG_New_SEARCH where > CONTAINS(FWD_ALL,'"landsorganisationen"') AND > CONTAINS(FWD_ALL,'"sverige"') > AND CONTAINS(FWD_ALL,'"i"') > > There is something with the word "i" that confuses the search engine. I > suspect that it is considered a special word and not indexed. But since I > want a hit, it seems like I need to know all words that won't work and > exclude them from searches. But it sounds like unnessecery work. It must > be > some better way to work around this. > And if I need to know which words, were is the list? > > Sincerely, > Thorsten
Thorsten wrote on Mon, 9 Jul 2007 02:38:02 -0700: [quoted text, click to view] > Hello, > I have started to test fulltext indexing. I have run into a problem that I > don't know how to solve. > I have this data in one fulltext-column "Landsorganisationen i Sverige". > When I perform this search I get 1 hit: > select top 100 * from CA_CATALOG_New_SEARCH where > CONTAINS(FWD_ALL,'"landsorganisationen"') AND > CONTAINS(FWD_ALL,'"sverige"') > > But, when I do this search I get 0 hits (should be 1 hit): > select top 100 * from CA_CATALOG_New_SEARCH where > CONTAINS(FWD_ALL,'"landsorganisationen"') AND > CONTAINS(FWD_ALL,'"sverige"') AND CONTAINS(FWD_ALL,'"i"') > > There is something with the word "i" that confuses the search engine. I > suspect that it is considered a special word and not indexed. But since I > want a hit, it seems like I need to know all words that won't work and > exclude them from searches. But it sounds like unnessecery work. It must > be some better way to work around this. > And if I need to know which words, were is the list? > > Sincerely, > Thorsten
There is a list of words held in the noise.<languagecode> files (eg. noise.enu for Neutral English) which are not indexed. You can remove all of these (but leave a single line with a space on it), and then repopulate your index. Just remember to edit the correct file related to your database locale. The other alternative is to parse the noise word list and create a routine to strip out these ignored words when searching - the benefit of this is that you keep the keep the index catalog small and strip out common words from the searches. However, it might be more work than the simple clearing of the noise file. Personally, I cleared all my noise word files - the performance hit was negligible for the tests I performed on my FT searching before and after. Dan
[quoted text, click to view] "Daniel Crichton" wrote: > > There is a list of words held in the noise.<languagecode> files (eg. > noise.enu for Neutral English) which are not indexed. You can remove all of > these (but leave a single line with a space on it), and then repopulate your > index. Just remember to edit the correct file related to your database > locale. > > The other alternative is to parse the noise word list and create a routine > to strip out these ignored words when searching - the benefit of this is > that you keep the keep the index catalog small and strip out common words > from the searches. However, it might be more work than the simple clearing > of the noise file. > > Personally, I cleared all my noise word files - the performance hit was > negligible for the tests I performed on my FT searching before and after. > > Dan >
Thanks to all! That answers my question. Thorsten
You need to rebuild your catalogs. Stopping and starting SQL Server or doing a new population won't help. -- Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com "Thorsten Jakobsson" <ThorstenJakobsson@discussions.microsoft.com> wrote in message news:4034C808-FA25-477F-B32E-9C2E6C49D69B@microsoft.com... [quoted text, click to view] > Hello, I need to ask about this again. I can't make it work. I have edited > the noise-files I can find. > In this directory C:\Program Files\Microsoft SQL > Server\MSSQL.1\MSSQL\FTData > I have edited noiseENG.txt, noiseENU.txt and noiseSVE.txt. I have removed > all instances of single letters. > I have for example removed single "b" from all three files. I have then > repopulated the fulltext index in every way I can think of. Still I get 0 > rows from the first statement and 17 from the second, they should both > return > 17: > SELECT DISTINCT cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE > CONTAINS(FWD_1,'"b"') > SELECT DISTINCT fwd_1,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE > FWD_1='b' > > One of the ways to repopulate I have tried is this > ALTER FULLTEXT INDEX ON CA_CATALOG_New_SEARCH STOP POPULATION > ALTER FULLTEXT INDEX ON CA_CATALOG_New_SEARCH START FULL POPULATION > > I have stopped the SQLServer and started, doesn't help. > > What is missing, it seems like the database doesn't understand that the > files have been edited? > The documentation says that the noisefiles are called for exampl Noisesve, > but there is no file with that name, the file that exists is noiseSVE.txt. > > Sincerely, > Thorsten >
Hello, I need to ask about this again. I can't make it work. I have edited the noise-files I can find. In this directory C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData I have edited noiseENG.txt, noiseENU.txt and noiseSVE.txt. I have removed all instances of single letters. I have for example removed single "b" from all three files. I have then repopulated the fulltext index in every way I can think of. Still I get 0 rows from the first statement and 17 from the second, they should both return 17: SELECT DISTINCT cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE CONTAINS(FWD_1,'"b"') SELECT DISTINCT fwd_1,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE FWD_1='b' One of the ways to repopulate I have tried is this ALTER FULLTEXT INDEX ON CA_CATALOG_New_SEARCH STOP POPULATION ALTER FULLTEXT INDEX ON CA_CATALOG_New_SEARCH START FULL POPULATION I have stopped the SQLServer and started, doesn't help. What is missing, it seems like the database doesn't understand that the files have been edited? The documentation says that the noisefiles are called for exampl Noisesve, but there is no file with that name, the file that exists is noiseSVE.txt. Sincerely, Thorsten
It finally works! I did this sp_configure 'default full-text language',1053 which means swedish and after that I rebuilt the index. It said 1033 before I ran sp_configure
This drives me crazy, I thought I had it figured out. I removed "b" from the noiseSVE.txt and removed, created and populated the fulltext table. It works so, I thought it was working. But when I try to search for "1" or "ett" ("ett" is one in swedish), it just doesn't work. I have tried to remove everything in noiseSVE.txt and just leave a single space, when that didn't work I tried to have just the word "XXXXXXXXXXX" in the noiseSVE.txt file. It doesn't work. This search returns nothing: SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE (CONTAINS(FWD_ALL,'"ett"')) this returns 488 rows: SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE FWD_ALL like '% ett %' this returns 840 rows: SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE (CONTAINS(FWD_ALL,'"b"')) Is there a special case for single numbers? Are single numbers not indexed in fulltext search? But for example "10" works alright. this returns 295 rows SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE (CONTAINS(FWD_ALL,'"10"')) What is going on?
Yet another time I rebuilt the fulltext table, now it works! "ett" and "1" works. I have no idea why it suddenly works. There must be something that triggers the changes in noiseSVE.txt that I don't understand. Thorsten [quoted text, click to view] "Thorsten Jakobsson" wrote: > This drives me crazy, I thought I had it figured out. I removed "b" from the > noiseSVE.txt and removed, created and populated the fulltext table. > It works so, I thought it was working. But when I try to search for "1" or > "ett" > ("ett" is one in swedish), it just doesn't work. I have tried to remove > everything in > noiseSVE.txt and just leave a single space, when that didn't work I tried to > have just the word "XXXXXXXXXXX" in the noiseSVE.txt file. It doesn't work. > > This search returns nothing: > SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE > (CONTAINS(FWD_ALL,'"ett"')) > > this returns 488 rows: > SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE > FWD_ALL like '% ett %' > > this returns 840 rows: > SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE > (CONTAINS(FWD_ALL,'"b"')) > > Is there a special case for single numbers? Are single numbers not indexed > in fulltext search? > But for example "10" works alright. > > this returns 295 rows > SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn WHERE > (CONTAINS(FWD_ALL,'"10"')) > > What is going on? >
single numbers are index if they are not in the noise word list. -- Looking for a SQL Server replication book? http://www.nwsu.com/0974973602.html Looking for a FAQ on Indexing Services/SQL FTS http://www.indexserverfaq.com "Thorsten Jakobsson" <ThorstenJakobsson@discussions.microsoft.com> wrote in message news:B938159D-A15B-4414-B034-1714D2A4B63B@microsoft.com... [quoted text, click to view] > This drives me crazy, I thought I had it figured out. I removed "b" from > the > noiseSVE.txt and removed, created and populated the fulltext table. > It works so, I thought it was working. But when I try to search for "1" or > "ett" > ("ett" is one in swedish), it just doesn't work. I have tried to remove > everything in > noiseSVE.txt and just leave a single space, when that didn't work I tried > to > have just the word "XXXXXXXXXXX" in the noiseSVE.txt file. It doesn't > work. > > This search returns nothing: > SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn > WHERE > (CONTAINS(FWD_ALL,'"ett"')) > > this returns 488 rows: > SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn > WHERE > FWD_ALL like '% ett %' > > this returns 840 rows: > SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn > WHERE > (CONTAINS(FWD_ALL,'"b"')) > > Is there a special case for single numbers? Are single numbers not indexed > in fulltext search? > But for example "10" works alright. > > this returns 295 rows > SELECT DISTINCT fwd_all,cn.CA_CATALOG_ID FROM CA_CATALOG_New_SEARCH cn > WHERE > (CONTAINS(FWD_ALL,'"10"')) > > What is going on? > > Thorsten
Don't see what you're looking for? Try a search.
|