You might try investigating "edit distance" algorithms. Here are some
"The Gekkster via SQLMonster.com" <forum@SQLMonster.com> wrote in message
news:504B77CE264E0@SQLMonster.com...
> Hey all,
>
> I'm trying to figure out how to compare one character string against a
list
> (of other strings) and then select a 'best' match. I'm not really sure how
to
> go about doing a 'best' comparison/selection. No doubt I'll have to tweak
the
> logic for 'best' as I work with this over time.
>
> This process works from 'units.vin' and we determine the needed 'code'
(for
> use in evaluating the 'styles' table) by examining the vin like so:
>
> declare @code char(10)
> select @code = left(vin, 8) + substring(vin, 10, 2)
>
> In the example unit (vin) below the code evaluates to '1FAFP5341G'. There
is
> no exact match in the list, and the 'best' match of those possible is
> '1FAFP53U1G', which would provide us with an 'sid' of '100001095' ('sid'
is
> ultimately what I'm trying to get at). The 'code' difference is only one
> character (4 vs U, in the eighth position). This is common, though at
times 2
> or more characters may differ, and in different positions.
>
> Here's an example of what I'm working with:
>
> CREATE TABLE [units] (
> [vin] [char] (17) NOT NULL ,
> [sid] [int] NULL ,
> [syear] [int] NULL ,
> [smake] [varchar] (50) NULL ,
> [smodel] [varchar] (50) NULL ,
> [sstyle] [varchar] (100) NULL ,
> CONSTRAINT [PK_units] PRIMARY KEY CLUSTERED
> (
> [vin]
> ) ON [PRIMARY]
> ) ON [PRIMARY]
> GO
>
> SAMPLE VALUES: 1FAFP53411G120924, 2001, , Ford, Taurus, NULL
>
> CREATE TABLE [styles] (
> [sid] [int] NOT NULL ,
> [code] [char] (10) NOT NULL ,
> CONSTRAINT [PK_styles] PRIMARY KEY CLUSTERED
> (
> [sid]
> ) ON [PRIMARY]
> ) ON [PRIMARY]
> GO
>
> SAMPLE VALUES (sid, code):
>
> 100001094 1FAFP5221A
> 100001094 1FAFP5221G
> 100001094 1FAFP52U11
> 100001094 1FAFP52U1A
> 100001094 1FAFP52U1G
> 100001094 1FAHP5221A
> 100001094 1FAHP5221G
> 100001094 1FAHP52U1A
> 100001094 1FAHP52U1G
> 100001095 1FAFP5321A
> 100001095 1FAFP5321G
> 100001095 1FAFP53S1A
> 100001095 1FAFP53S1G
> 100001095 1FAFP53U1A
> 100001095 1FAFP53U1G
> 100001095 1FAHP5321A
> 100001095 1FAHP5321G
> 100001095 1FAHP53S1A
> 100001095 1FAHP53S1G
> 100001095 1FAHP53U1A
> 100001095 1FAHP53U1G
> 100001096 1FAFP5821A
> 100001096 1FAFP5821G
> 100001096 1FAFP58S1A
> 100001096 1FAFP58S1G
> 100001096 1FAFP58U1A
> 100001096 1FAFP58U1G
> 100001096 1FAHP5821A
> 100001096 1FAHP5821G
> 100001096 1FAHP58S1A
> 100001096 1FAHP58S1G
> 100001096 1FAHP58U1A
> 100001096 1FAHP58U1G
> 100001096 1FALP58U1G
> 100001097 1FAFP56S1A
> 100001097 1FAFP56S1G
> 100001097 1FAHP56S1A
> 100001097 1FAHP56S1G
> 100001098 1FAFP5521A
> 100001098 1FAFP5521G
> 100001098 1FAFP55S1A
> 100001098 1FAFP55S1G
> 100001098 1FAFP55U1A
> 100001098 1FAFP55U1G
> 100001098 1FAHP5521A
> 100001098 1FAHP5521G
> 100001098 1FAHP55S1A
> 100001098 1FAHP55S1G
> 100001098 1FAHP55U1A
> 100001098 1FAHP55U1G
> 100001098 1FALP5521A
> 100001098 1FALP55U1A
> 100001098 1FALP55U1G
>
> I guess I'll need to develop some sort of 'weighting' to help determine
the
> best match to select; but to get started maybe someone can help me in
setting
> something up for this type of comparison/selection, maybe using just a
> character/position difference as an initial baseline to help select the
> closest match. Or better ideas...?
>
> Does anyone have a suggestion (or a code example) for how I can try to
> accomplish something like this?
>
> Thanks.
>
> --
> Message posted via
http://www.sqlmonster.com