sql server full text search:
Hi, I brought this up once ago, but I read something which might open a new possibility. What I am trying to do is this: select p.Name, p.Score, ft.Rank from Products p join ( select [key], rank from containstable(Products, Name, '"Screw*"', 10) ) as ft on ft.[key]= p.ProductId order by p.Score desc Let's say "Products" contains one million products and there are 20000 matching products containing "screw" as a part of their name. What I want to get are the top 10 products matching the query, but the FT rank should equal the Score rank. I don't want this: containstable(Products, Name, '"Screw*"', 50000) and then join the resulting 20000 rows with products and order the set by Score, since that will be too expensive. Best would be, if I could actually set the value that the ranking algorithm is based on. The indexer would simply use Score as the predominant factor for the ranking. So containstable(Products, Name, '"Screw*"', 10) would return the first 10 matches but sorted by Score. [quoted text, click to view] >From what I know this is not possible with FT in 2000 or 2005, but I
read that it is possible to use CLR integration to customize the indexing process. However, I could not find anything in the docs about that. Maybe one can only create indexes and stuff like that, but I am still hoping that someone has a clue on how to possibly manipulate FT rank in SQL Server 2005. Kind regards DC
DC, I've created my own CLR function and use it instead of FTS rank. It gives much better result (by similarity) I use Levenshtein Edit distance to calculate the score. Thanks, Yuri [quoted text, click to view] "DC" wrote: > Hi, > > I brought this up once ago, but I read something which might open a > new possibility. What I am trying to do is this: > > select p.Name, p.Score, ft.Rank > from Products p > join ( > select [key], rank from > containstable(Products, Name, '"Screw*"', 10) > ) as ft > on ft.[key]= p.ProductId > order by p.Score desc > > Let's say "Products" contains one million products and there are 20000 > matching products containing "screw" as a part of their name. What I > want to get are the top 10 products matching the query, but the FT > rank should equal the Score rank. I don't want this: > > containstable(Products, Name, '"Screw*"', 50000) > > and then join the resulting 20000 rows with products and order the set > by Score, since that will be too expensive. > > Best would be, if I could actually set the value that the ranking > algorithm is based on. The indexer would simply use Score as the > predominant factor for the ranking. So > > containstable(Products, Name, '"Screw*"', 10) > > would return the first 10 matches but sorted by Score. > > >From what I know this is not possible with FT in 2000 or 2005, but I > read that it is possible to use CLR integration to customize the > indexing process. However, I could not find anything in the docs about > that. Maybe one can only create indexes and stuff like that, but I am > still hoping that someone has a clue on how to possibly manipulate FT > rank in SQL Server 2005. > > Kind regards > DC >
Hi Yuri, very interesting, were you able to manipulate the fulltext index ranking, i.e. would containstable(Products, Name, '"Screw*"', 10) return the top 10 results as calculated by your algorithm? Do you maybe have a link or a topic to look for about this kind of CLR integration? Regards DC [quoted text, click to view] On 15 Mai, 20:41, ynogin <yno...@discussions.microsoft.com> wrote: > DC, > I've created my own CLR function and use it instead of FTS rank. It gives > much better result (by similarity) > I use Levenshtein Edit distance to calculate the score. > > Thanks, > > Yuri > > > > "DC" wrote: > > Hi, > > > I brought this up once ago, but I read something which might open a > > new possibility. What I am trying to do is this: > > > select p.Name, p.Score, ft.Rank > > from Products p > > join ( > > select [key], rank from > > containstable(Products, Name, '"Screw*"', 10) > > ) as ft > > on ft.[key]= p.ProductId > > order by p.Score desc > > > Let's say "Products" contains one million products and there are 20000 > > matching products containing "screw" as a part of their name. What I > > want to get are the top 10 products matching the query, but the FT > > rank should equal the Score rank. I don't want this: > > > containstable(Products, Name, '"Screw*"', 50000) > > > and then join the resulting 20000 rows with products and order the set > > by Score, since that will be too expensive. > > > Best would be, if I could actually set the value that the ranking > > algorithm is based on. The indexer would simply use Score as the > > predominant factor for the ranking. So > > > containstable(Products, Name, '"Screw*"', 10) > > > would return the first 10 matches but sorted by Score. > > > >From what I know this is not possible with FT in 2000 or 2005, but I > > read that it is possible to use CLR integration to customize the > > indexing process. However, I could not find anything in the docs about > > that. Maybe one can only create indexes and stuff like that, but I am > > still hoping that someone has a clue on how to possibly manipulate FT > > rank in SQL Server 2005. > > > Kind regards > > DC- Zitierten Text ausblenden - > > - Zitierten Text anzeigen -
DC, The syntax I use is more like: Select top(@n_rows) *,my_score_fn(your_column,@your_value) from your_table where containstable(your_column,@your_value) order by my_score_fn(your_column,@your_value) desc You also can use freetexttable function instead. Thanks, Yuri [quoted text, click to view] "DC" wrote: > Hi Yuri, > > very interesting, were you able to manipulate the fulltext index > ranking, i.e. would > > containstable(Products, Name, '"Screw*"', 10) > > return the top 10 results as calculated by your algorithm? > > Do you maybe have a link or a topic to look for about this kind of CLR > integration? > > Regards > DC > > > On 15 Mai, 20:41, ynogin <yno...@discussions.microsoft.com> wrote: > > DC, > > I've created my own CLR function and use it instead of FTS rank. It gives > > much better result (by similarity) > > I use Levenshtein Edit distance to calculate the score. > > > > Thanks, > > > > Yuri > > > > > > > > "DC" wrote: > > > Hi, > > > > > I brought this up once ago, but I read something which might open a > > > new possibility. What I am trying to do is this: > > > > > select p.Name, p.Score, ft.Rank > > > from Products p > > > join ( > > > select [key], rank from > > > containstable(Products, Name, '"Screw*"', 10) > > > ) as ft > > > on ft.[key]= p.ProductId > > > order by p.Score desc > > > > > Let's say "Products" contains one million products and there are 20000 > > > matching products containing "screw" as a part of their name. What I > > > want to get are the top 10 products matching the query, but the FT > > > rank should equal the Score rank. I don't want this: > > > > > containstable(Products, Name, '"Screw*"', 50000) > > > > > and then join the resulting 20000 rows with products and order the set > > > by Score, since that will be too expensive. > > > > > Best would be, if I could actually set the value that the ranking > > > algorithm is based on. The indexer would simply use Score as the > > > predominant factor for the ranking. So > > > > > containstable(Products, Name, '"Screw*"', 10) > > > > > would return the first 10 matches but sorted by Score. > > > > > >From what I know this is not possible with FT in 2000 or 2005, but I > > > read that it is possible to use CLR integration to customize the > > > indexing process. However, I could not find anything in the docs about > > > that. Maybe one can only create indexes and stuff like that, but I am > > > still hoping that someone has a clue on how to possibly manipulate FT > > > rank in SQL Server 2005. > > > > > Kind regards > > > DC- Zitierten Text ausblenden - > > > > - Zitierten Text anzeigen - > >
Hi Yuri, if containstable(your_column,@your_value) returns 100.000 results, then my_score_fn will have to do a lot of sorting. This is exactly the problem that I am facing, where out of about one million indexed rows easily 10 to 100 thousand results are being returned by weak queries. For your function my_score_fn it would probably not be of much value to do what I am trying, since my_score_fn takes the search argument as a parameter. So you cannot precalculate your desired ranking and store it into an additional column. But my "Score" is totally independant of the search argument. The RANK that sql server calculates is useless for me. Since containstable(Products, Name, '"Screw*"', 10) gives the top 10 results by RANK, I only see the chance to manipulate RANK generation somehow. But I apprehend that this is not possible at all. Regards DC [quoted text, click to view] On 16 Mai, 17:14, ynogin <yno...@discussions.microsoft.com> wrote: > DC, > The syntax I use is more like: > > Select top(@n_rows) *,my_score_fn(your_column,@your_value) > from your_table > where containstable(your_column,@your_value) > order by my_score_fn(your_column,@your_value) desc > > You also can use freetexttable function instead. > > Thanks, > Yuri > > > > "DC" wrote: > > Hi Yuri, > > > very interesting, were you able to manipulate the fulltext index > > ranking, i.e. would > > > containstable(Products, Name, '"Screw*"', 10) > > > return the top 10 results as calculated by your algorithm? > > > Do you maybe have a link or a topic to look for about this kind of CLR > > integration? > > > Regards > > DC > > > On 15 Mai, 20:41, ynogin <yno...@discussions.microsoft.com> wrote: > > > DC, > > > I've created my own CLR function and use it instead of FTS rank. It gives > > > much better result (by similarity) > > > I use Levenshtein Edit distance to calculate the score. > > > > Thanks, > > > > Yuri > > > > "DC" wrote: > > > > Hi, > > > > > I brought this up once ago, but I read something which might open a > > > > new possibility. What I am trying to do is this: > > > > > select p.Name, p.Score, ft.Rank > > > > from Products p > > > > join ( > > > > select [key], rank from > > > > containstable(Products, Name, '"Screw*"', 10) > > > > ) as ft > > > > on ft.[key]= p.ProductId > > > > order by p.Score desc > > > > > Let's say "Products" contains one million products and there are 20000 > > > > matching products containing "screw" as a part of their name. What I > > > > want to get are the top 10 products matching the query, but the FT > > > > rank should equal the Score rank. I don't want this: > > > > > containstable(Products, Name, '"Screw*"', 50000) > > > > > and then join the resulting 20000 rows with products and order the set > > > > by Score, since that will be too expensive. > > > > > Best would be, if I could actually set the value that the ranking > > > > algorithm is based on. The indexer would simply use Score as the > > > > predominant factor for the ranking. So > > > > > containstable(Products, Name, '"Screw*"', 10) > > > > > would return the first 10 matches but sorted by Score. > > > > > >From what I know this is not possible with FT in 2000 or 2005, but I > > > > read that it is possible to use CLR integration to customize the > > > > indexing process. However, I could not find anything in the docs about > > > > that. Maybe one can only create indexes and stuff like that, but I am > > > > still hoping that someone has a clue on how to possibly manipulate FT > > > > rank in SQL Server 2005. > > > > > Kind regards > > > > DC- Zitierten Text ausblenden - > > > > - Zitierten Text anzeigen -- Zitierten Text ausblenden - > > - Zitierten Text anzeigen -
DC, You actually can put a trash holder like Select .... Where score_fn(search_column,@value) > 30 --in percent order by ... This way you will cut out most of the low "similarity" matches and your result set will be a lot smaller. Thanks, Yuri [quoted text, click to view] "DC" wrote: > Hi Yuri, > > if containstable(your_column,@your_value) returns 100.000 results, > then my_score_fn will have to do a lot of sorting. > > This is exactly the problem that I am facing, where out of about one > million indexed rows easily 10 to 100 thousand results are being > returned by weak queries. > > For your function my_score_fn it would probably not be of much value > to do what I am trying, since my_score_fn takes the search argument as > a parameter. So you cannot precalculate your desired ranking and store > it into an additional column. But my "Score" is totally independant of > the search argument. The RANK that sql server calculates is useless > for me. Since > > containstable(Products, Name, '"Screw*"', 10) > > gives the top 10 results by RANK, I only see the chance to manipulate > RANK generation somehow. But I apprehend that this is not possible at > all. > > Regards > DC > > > On 16 Mai, 17:14, ynogin <yno...@discussions.microsoft.com> wrote: > > DC, > > The syntax I use is more like: > > > > Select top(@n_rows) *,my_score_fn(your_column,@your_value) > > from your_table > > where containstable(your_column,@your_value) > > order by my_score_fn(your_column,@your_value) desc > > > > You also can use freetexttable function instead. > > > > Thanks, > > Yuri > > > > > > > > "DC" wrote: > > > Hi Yuri, > > > > > very interesting, were you able to manipulate the fulltext index > > > ranking, i.e. would > > > > > containstable(Products, Name, '"Screw*"', 10) > > > > > return the top 10 results as calculated by your algorithm? > > > > > Do you maybe have a link or a topic to look for about this kind of CLR > > > integration? > > > > > Regards > > > DC > > > > > On 15 Mai, 20:41, ynogin <yno...@discussions.microsoft.com> wrote: > > > > DC, > > > > I've created my own CLR function and use it instead of FTS rank. It gives > > > > much better result (by similarity) > > > > I use Levenshtein Edit distance to calculate the score. > > > > > > Thanks, > > > > > > Yuri > > > > > > "DC" wrote: > > > > > Hi, > > > > > > > I brought this up once ago, but I read something which might open a > > > > > new possibility. What I am trying to do is this: > > > > > > > select p.Name, p.Score, ft.Rank > > > > > from Products p > > > > > join ( > > > > > select [key], rank from > > > > > containstable(Products, Name, '"Screw*"', 10) > > > > > ) as ft > > > > > on ft.[key]= p.ProductId > > > > > order by p.Score desc > > > > > > > Let's say "Products" contains one million products and there are 20000 > > > > > matching products containing "screw" as a part of their name. What I > > > > > want to get are the top 10 products matching the query, but the FT > > > > > rank should equal the Score rank. I don't want this: > > > > > > > containstable(Products, Name, '"Screw*"', 50000) > > > > > > > and then join the resulting 20000 rows with products and order the set > > > > > by Score, since that will be too expensive. > > > > > > > Best would be, if I could actually set the value that the ranking > > > > > algorithm is based on. The indexer would simply use Score as the > > > > > predominant factor for the ranking. So > > > > > > > containstable(Products, Name, '"Screw*"', 10) > > > > > > > would return the first 10 matches but sorted by Score. > > > > > > > >From what I know this is not possible with FT in 2000 or 2005, but I > > > > > read that it is possible to use CLR integration to customize the > > > > > indexing process. However, I could not find anything in the docs about > > > > > that. Maybe one can only create indexes and stuff like that, but I am > > > > > still hoping that someone has a clue on how to possibly manipulate FT > > > > > rank in SQL Server 2005. > > > > > > > Kind regards > > > > > DC- Zitierten Text ausblenden - > > > > > > - Zitierten Text anzeigen -- Zitierten Text ausblenden - > > > > - Zitierten Text anzeigen - > >
Hi Yuri, if I do this: select p.name, p.score from containstable(products, name, '"al*"') ft join products p on p.id = ft.[Key] and p.score > 100 it will still take the fulltext service a long time to dig up ALL products matching "al*" and it will filter the ones with a score of 100 or less afterwards. This, on the other hand, will be a lot faster: select p.name, p.score from containstable(products, name, '"al*"', 200) ft join products p on p.id = ft.[Key] and p.score > 100 because the ft index only returns the first 200 rows matchin "al*" and then the p.score > 100 clause will be applied... so there is probably no match left. My whole point is, that I want this containstable(products, name, '"al*"', 200) to return the first 200 products matching "al*" but sorted by "score desc". Normally containstable(products, name, '"al*"', 200) will return the first 200 matches sorted by RANK, and that RANK is being calculated by ft index engine. What I want is to produce a custom RANK. Regards DC [quoted text, click to view] On 18 Mai, 15:16, ynogin <yno...@discussions.microsoft.com> wrote: > DC, > You actually can put a trash holder like > > Select .... > > Where score_fn(search_column,@value) > 30 --in percent > order by ... > > This way you will cut out most of the low "similarity" matches and your > result set will be a lot smaller. > > Thanks, > > Yuri > > > > "DC" wrote: > > Hi Yuri, > > > if containstable(your_column,@your_value) returns 100.000 results, > > then my_score_fn will have to do a lot of sorting. > > > This is exactly the problem that I am facing, where out of about one > > million indexed rows easily 10 to 100 thousand results are being > > returned by weak queries. > > > For your function my_score_fn it would probably not be of much value > > to do what I am trying, since my_score_fn takes the search argument as > > a parameter. So you cannot precalculate your desired ranking and store > > it into an additional column. But my "Score" is totally independant of > > the search argument. The RANK that sql server calculates is useless > > for me. Since > > > containstable(Products, Name, '"Screw*"', 10) > > > gives the top 10 results by RANK, I only see the chance to manipulate > > RANK generation somehow. But I apprehend that this is not possible at > > all. > > > Regards > > DC > > > On 16 Mai, 17:14, ynogin <yno...@discussions.microsoft.com> wrote: > > > DC, > > > The syntax I use is more like: > > > > Select top(@n_rows) *,my_score_fn(your_column,@your_value) > > > from your_table > > > where containstable(your_column,@your_value) > > > order by my_score_fn(your_column,@your_value) desc > > > > You also can use freetexttable function instead. > > > > Thanks, > > > Yuri > > > > "DC" wrote: > > > > Hi Yuri, > > > > > very interesting, were you able to manipulate the fulltext index > > > > ranking, i.e. would > > > > > containstable(Products, Name, '"Screw*"', 10) > > > > > return the top 10 results as calculated by your algorithm? > > > > > Do you maybe have a link or a topic to look for about this kind of CLR > > > > integration? > > > > > Regards > > > > DC > > > > > On 15 Mai, 20:41, ynogin <yno...@discussions.microsoft.com> wrote: > > > > > DC, > > > > > I've created my own CLR function and use it instead of FTS rank. It gives > > > > > much better result (by similarity) > > > > > I use Levenshtein Edit distance to calculate the score. > > > > > > Thanks, > > > > > > Yuri > > > > > > "DC" wrote: > > > > > > Hi, > > > > > > > I brought this up once ago, but I read something which might open a > > > > > > new possibility. What I am trying to do is this: > > > > > > > select p.Name, p.Score, ft.Rank > > > > > > from Products p > > > > > > join ( > > > > > > select [key], rank from > > > > > > containstable(Products, Name, '"Screw*"', 10) > > > > > > ) as ft > > > > > > on ft.[key]= p.ProductId > > > > > > order by p.Score desc > > > > > > > Let's say "Products" contains one million products and there are 20000 > > > > > > matching products containing "screw" as a part of their name. What I > > > > > > want to get are the top 10 products matching the query, but the FT > > > > > > rank should equal the Score rank. I don't want this: > > > > > > > containstable(Products, Name, '"Screw*"', 50000) > > > > > > > and then join the resulting 20000 rows with products and order the set > > > > > > by Score, since that will be too expensive. > > > > > > > Best would be, if I could actually set the value that the ranking > > > > > > algorithm is based on. The indexer would simply use Score as the > > > > > > predominant factor for the ranking. So > > > > > > > containstable(Products, Name, '"Screw*"', 10) > > > > > > > would return the first 10 matches but sorted by Score. > > > > > > > >From what I know this is not possible with FT in 2000 or 2005, but I > > > > > > read that it is possible to use CLR integration to customize the > > > > > > indexing process. However, I could not find anything in the docs about > > > > > > that. Maybe one can only create indexes and stuff like that, but I am > > > > > > still hoping that someone has a clue on how to possibly manipulate FT > > > > > > rank in SQL Server 2005. > > > > > > > Kind regards > > > > > > DC- Zitierten Text ausblenden - > > > > > > - Zitierten Text anzeigen -- Zitierten Text ausblenden - > > > > - Zitierten Text anzeigen -- Zitierten Text ausblenden - > > - Zitierten Text anzeigen -
Don't see what you're looking for? Try a search.
|