Groups | Blog | Home
all groups > sql server programming > june 2005 >

sql server programming : help with a comparison and select


Jeremy Williams
6/23/2005 5:35:06 PM
You might try investigating "edit distance" algorithms. Here are some
discussions about them (enter this in the Google Groups search screen):

Edit distance group:microsoft.public.sqlserver.*

Also, here is one version for T-SQL: http://www.merriampark.com/ldtsql.htm

IHTH
Jeremy


[quoted text, click to view]

The Gekkster via SQLMonster.com
6/23/2005 9:48:26 PM
Hey all,

I'm trying to figure out how to compare one character string against a list
(of other strings) and then select a 'best' match. I'm not really sure how to
go about doing a 'best' comparison/selection. No doubt I'll have to tweak the
logic for 'best' as I work with this over time.

This process works from 'units.vin' and we determine the needed 'code' (for
use in evaluating the 'styles' table) by examining the vin like so:

declare @code char(10)
select @code = left(vin, 8) + substring(vin, 10, 2)

In the example unit (vin) below the code evaluates to '1FAFP5341G'. There is
no exact match in the list, and the 'best' match of those possible is
'1FAFP53U1G', which would provide us with an 'sid' of '100001095' ('sid' is
ultimately what I'm trying to get at). The 'code' difference is only one
character (4 vs U, in the eighth position). This is common, though at times 2
or more characters may differ, and in different positions.

Here's an example of what I'm working with:

CREATE TABLE [units] (
[vin] [char] (17) NOT NULL ,
[sid] [int] NULL ,
[syear] [int] NULL ,
[smake] [varchar] (50) NULL ,
[smodel] [varchar] (50) NULL ,
[sstyle] [varchar] (100) NULL ,
CONSTRAINT [PK_units] PRIMARY KEY CLUSTERED
(
[vin]
) ON [PRIMARY]
) ON [PRIMARY]
GO

SAMPLE VALUES: 1FAFP53411G120924, 2001, , Ford, Taurus, NULL

CREATE TABLE [styles] (
[sid] [int] NOT NULL ,
[code] [char] (10) NOT NULL ,
CONSTRAINT [PK_styles] PRIMARY KEY CLUSTERED
(
[sid]
) ON [PRIMARY]
) ON [PRIMARY]
GO

SAMPLE VALUES (sid, code):

100001094 1FAFP5221A
100001094 1FAFP5221G
100001094 1FAFP52U11
100001094 1FAFP52U1A
100001094 1FAFP52U1G
100001094 1FAHP5221A
100001094 1FAHP5221G
100001094 1FAHP52U1A
100001094 1FAHP52U1G
100001095 1FAFP5321A
100001095 1FAFP5321G
100001095 1FAFP53S1A
100001095 1FAFP53S1G
100001095 1FAFP53U1A
100001095 1FAFP53U1G
100001095 1FAHP5321A
100001095 1FAHP5321G
100001095 1FAHP53S1A
100001095 1FAHP53S1G
100001095 1FAHP53U1A
100001095 1FAHP53U1G
100001096 1FAFP5821A
100001096 1FAFP5821G
100001096 1FAFP58S1A
100001096 1FAFP58S1G
100001096 1FAFP58U1A
100001096 1FAFP58U1G
100001096 1FAHP5821A
100001096 1FAHP5821G
100001096 1FAHP58S1A
100001096 1FAHP58S1G
100001096 1FAHP58U1A
100001096 1FAHP58U1G
100001096 1FALP58U1G
100001097 1FAFP56S1A
100001097 1FAFP56S1G
100001097 1FAHP56S1A
100001097 1FAHP56S1G
100001098 1FAFP5521A
100001098 1FAFP5521G
100001098 1FAFP55S1A
100001098 1FAFP55S1G
100001098 1FAFP55U1A
100001098 1FAFP55U1G
100001098 1FAHP5521A
100001098 1FAHP5521G
100001098 1FAHP55S1A
100001098 1FAHP55S1G
100001098 1FAHP55U1A
100001098 1FAHP55U1G
100001098 1FALP5521A
100001098 1FALP55U1A
100001098 1FALP55U1G

I guess I'll need to develop some sort of 'weighting' to help determine the
best match to select; but to get started maybe someone can help me in setting
something up for this type of comparison/selection, maybe using just a
character/position difference as an initial baseline to help select the
closest match. Or better ideas...?

Does anyone have a suggestion (or a code example) for how I can try to
accomplish something like this?

Thanks.

--
The Gekkster via SQLMonster.com
6/24/2005 12:00:00 AM
Thanks for the note on 'edit distance'.

If I understand this correctly it would produce 3 'closest' matches:
1FAFP5321G, 1FAFP53S1G, and (correct) 1FAFP53U1G. This does help narrow the
list.

I see that the logic to do this could quickly become very complicated.

Anyone have any thoughts on a way to help further refine something like this,
maybe in combination with ascii, soundex, difference, or...? Am I overlooking
an 'obvious' solution?

--
AddThis Social Bookmark Button