Hi John,
Thnx for the reply.
Actually my requirement is as follows:
I have thousands of images. I OCR them & capture their content & dump the
OCRed data into a SQL Server table. Some of these images will be duplicate
images with a little variation in their content; Say one documents has a
scribbled note on it & then scanned into an image & another copy of the same
document is scanned without the scribbled text/note. Now i want these two
images to be identified as duplicates.
I was planning to make the FTS feature of SQL Server, wherein i take the
data in the first row & compare with the rest of the data using the FREETEXT
/ FREETEXTTABLE predicates. Now that you have mentioned that this is not a
good option for my requirement, can u please suggest any other alternative
way of implementing this.
Thnx in advance,
Philipino
[quoted text, click to view] "John Kane" <jt-kane@comcast.net> wrote in message
news:u7GGfaH4DHA.1672@TK2MSFTNGP12.phx.gbl...
> Auntin,
> I don't believe that SQL Server's Full-text Search components with the
> MSSearch service is the best solution for your requirement.
> Neither FREETEXT or CONTAINS were designed to detect duplicate data in SQL
> Server.
>
> Regards,
> John
>
>
> "Auntin Philipino" <something@somethingelse.com> wrote in message
> news:eMcyymB4DHA.488@TK2MSFTNGP12.phx.gbl...
> > Hi,
> >
> > I have a requirement where in i have to search for duplicate documents
in
> my
> > folder. I copy the contents in the document to a TEXT column & i enable
> FTS
> > on this column.
> >
> > Now i want to check for the content in one document being present in
> another
> > document (Maybe with a few minor changes). Which is the best way to
> > implement this?
> >
> > I am finding that FREETEXT returns data which have a huge degree of
> > variance. And CONTAINS keeps throwing up errors if the criteria data has
a
> > newline character or any spl character.... How to solve this problem?
> >
> > Thnx in advance,
> > Philipino
> >
> >
>
>