Groups | Blog | Home
all groups > sql server (alternate) > june 2005 >

sql server (alternate) : SQL 2000 and UTF-16 encoding


dBlue
6/28/2005 3:59:29 PM
Hi all,

I have an issue on querying against UTF-16 encoded characters in
SQL2000 database: For example the "L=F3pez" is saved into database as
"L=C3=B3pez" (due to the UTF-16 encoding); somehow, when I query data
with conditions of "like 'lop%'" or "like 'L=F3p%'", the row of L=F3pez
would not return.

NOTE: the accent insensitive collation can not help in this case.

Thank you,

Albion(052X)
dBlue
6/29/2005 5:41:17 AM
Thank you Erland for the reply, unfortunatly, the problem remains.

I understand your example well, and it is exactly the sql query we are
using now. The problem is not at the accent. The query does not
return me the data row, because the character is save as "L=C3=B3pez" in
the database, and the [o=F3] does not help on it.

BTW, the accent insensitive insensitive does not help either.

Thank you,

Albion(052X)
dBlue
6/29/2005 6:29:04 AM
Hi,

Please try the script below. It is a bit weired to me.



CREATE TABLE tbl
(lastname nvarchar(30) NOT NULL)
go
INSERT tbl (lastname) VALUES ('M=C3=B3peX')
INSERT tbl (lastname) VALUES ('M=F3peY')
INSERT tbl (lastname) VALUES ('MopeZ')
go

SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_CP1_CI_AI LIKE N'%[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_CP1_CI_AI LIKE N'%M[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_CP1_CI_AI LIKE N'%M%[=C3=B3o]pe%'
GO

Drop table tbl

The frist two select statement does not return same result as they
suppose to. Any idea?

Thank you,

Albion(052X)
dBlue
6/29/2005 7:01:01 AM
BYTW, I found an very instereting stuff. Please the script below,

The first 1 and 2 select statement should return same result, but they
do not; and only difference between them is 1 statement does not have
the M character inside the like statement.


CREATE TABLE tbl
(lastname nvarchar(30) NOT NULL)
go
INSERT tbl (lastname) VALUES ('M=C3=B3peX')
INSERT tbl (lastname) VALUES ('M=F3peY')
INSERT tbl (lastname) VALUES ('MopeZ')
go
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_CP1_CI_AI LIKE N'%[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_CP1_CI_AI LIKE N'%M[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_CP1_CI_AI LIKE N'%M%[=C3=B3o]pe%'
SELECT * FROM tbl WHERE lastname Collate
SQL_Latin1_General_CP1_CI_AI LIKE N'%[o]pe%'
GO

drop table tbl=20


Thank you,

Albion(052X)
Erland Sommarskog
6/29/2005 11:13:37 AM
[posted and mailed, please reply in news]

dBlue (zkvneml@hotmail.com) writes:
[quoted text, click to view]

I answered a very similar question yesterday, and while the username
and e-mail address, I would assume that you are the same person. I cannot
do much more but repeat my answer from yesterday:

I was not able to repeat this. If you can produce a script similar
to the one below that demonstrates the problem, it's a little easier
to get an idea of what is going on. Don't forget to include the collation
of the column.

CREATE TABLE tbl
(lastname nvarchar(30) COLLATE Finnish_Swedish_BIN NOT NULL)
go
INSERT tbl (lastname) VALUES ('Mopez')
INSERT tbl (lastname) VALUES ('Mópez')
INSERT tbl (lastname) VALUES ('lopez de la serra')
INSERT tbl (lastname) VALUES ('x lópez')
go
SELECT * FROM tbl WHERE lastname LIKE '%[oó]pez%'
SELECT * FROM tbl WHERE lastname LIKE '%l[oó]pez%'
SELECT * FROM tbl WHERE lastname LIKE N'%[oó]pez%'
SELECT * FROM tbl WHERE lastname LIKE N'%l[oó]pez%'
go
drop table tbl


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
Erland Sommarskog
6/29/2005 9:20:59 PM
dBlue (zkvneml@hotmail.com) writes:
[quoted text, click to view]

But "MópeX" is not the same as "MópeX". The first string has six
characters, the second has five. Both strings match N'%[óo]pe%' -
that is first any number of characters and then exactly one of
"Ã", "³" or "o" and then "pe" followed by any number of chars.

On the other hand, the string N'%M[óo]pe%', you say that there
should be an M and then exactly one of "Ã", "³" or "o" and then "pe",
which there is in the six-char string.

OK, so that "MòpeZ" is really a representation of "MópeX", to whit
the bit pattern that you have in UTF-8, but stored in a string which
is supposed to hold UTF_16 values, and then it goes downhill from
there.

So I guess you really problem is why you have UTF-8 encodings in the
wrong place. When I saw your first posting, I assumed that the mangled
UTF-8 came from the news posting itself, but it appears now that it
comes from the database.

--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
vincent wehren
6/29/2005 9:54:24 PM
| "dBlue" <zkvneml@hotmail.com> schrieb im Newsbeitrag
news:1119999569.407793.143810@g49g2000cwa.googlegroups.com...
|Hi all,
|
|I have an issue on querying against UTF-16 encoded characters in
|SQL2000 database: For example the "López" is saved into database as
|"López" (due to the UTF-16 encoding);

This is not a UTF-16 representation but UTF-8.

--

Vincent Wehren




|somehow, when I query data
|with conditions of "like 'lop%'" or "like 'Lóp%'", the row of López
|would not return.

|NOTE: the accent insensitive collation can not help in this case.

|Thank you,

|Albion(052X)

dBlue
7/5/2005 8:36:20 AM
Hi,

Thank you for the reply.

It seems like it is the problem on querying for the 'M=C3=B3peX' from the
SQL, but getting correct 'M=F3peX', rather than getting the 'M=C3=B3peX'.
Is that right?

The following is the function we use to do the decoding for the
URLstring. I can add the WideCharToMultiByte and MultiByteToWideChar
to get correct data. But is there a more efficient way?

str =3D Replace(str, "+", " ")

i =3D 0
Do
i =3D InStr(i + 1, str, "%")
If i =3D 0 Then
Exit Do
End If
var_char =3D Chr(CStr("&H" + Mid(str, i + 1, 2)))
If Err.Number =3D 0 Then
str =3D Mid(str, 1, i - 1) _
& var_char _
& Mid(str, i + 3)
End If
Loop

thanks again.


[quoted text, click to view]
Erland Sommarskog
7/5/2005 9:18:10 PM
dBlue (zkvneml@hotmail.com) writes:
[quoted text, click to view]

The problem is that you have garbage in your database, yes.

[quoted text, click to view]

I don't know what URL string you have, and I am not an expert on how
to write effecient VB code.

I have however used WideCharToMultiByte and MultiByteToWideChar (from
C++) to convert between UTF-8 and UTF-16, and found them to work well.


--
Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se

Books Online for SQL Server SP3 at
AddThis Social Bookmark Button