sql server odbc:
Hi, Any help with the following query would be much appreciated. We are developing an application in VB6 for SQL Server 2000 (SP3), running on Win2k and have come across the following issue: We have a 30 character attribute/column (varchar) in which we store surnames. We now have some records where one or more characters within some of the surnames contain accented characters. We issue queries against the table and order by the surname, expecting an alphabetical sort. However, we are finding some very odd results with the accented values and how SQL Server 'sees' them. We typically use a collation of SQL_Latin1_General_CP1_CS_AS. Because of our findings, we created a test table that contains one attribute, string_test varchar(30), NULLS allowed, no index. We then populated the table with 10K records based on the following: 'A' + randomly generated ASCII character + randomly generated alphabetic character We then ran the following SQL: SELECT * FROM cbs_br8151_test3 WHERE string_test LIKE 'A [aàáâãäå][ab]%' The result we got was: 1. Aaa 2. Aàa 3. Aáa 4. Aãa 5. Aäa 6. Aåa 7. Aab 8. Aáb 9. Aâb 10. Aäb When viewing the first 5 records in isolation, the sort order appears correct. However, when looking at all 10 records, the sort order seems to be based on the characters and then additionally grouped on the first and last character. We think the sort order should have produced: 1. Aaa 2. Aab 3. Aàa 4. Aáa 5. Aáb 6. Aãa 7. Aäa 8. Aäb 9. Aåa 10. Aâb Further testing has shown that a binary-based collation sequence produces the same sort order as our previous query (using the CP1_CS_AS collation sequence). So changing the collation sequence in use doesn't seem to make a big difference, if any at all. However, we don't think that is right either. To prove the point, I then queried for the ASCII conversion of each of the characters: SELECT string_test, ASCII(SUBSTRING(string_test,1,1)) AS first_char, ASCII(SUBSTRING(string_test,2,1)) AS second_char, ASCII(SUBSTRING(string_test,3,1)) AS third_char FROM cbs_br8151_test3 WHERE string_test LIKE 'A[aàáâãäå][ab]%' ORDER BY string_test 1. Aaa 65 97 97 2. Aàa 65 224 97 3. Aáa 65 225 97 4. Aãa 65 227 97 5. Aäa 65 228 97 6. Aåa 65 229 97 7. Aab 65 97 98 8. Aáb 65 225 98 9. Aâb 65 226 98 10. Aäb 65 228 98 This proves that whatever conversion of the character data is used to base the 'sort' on, it isn't alphabetical, ASCII, or the binary values assigned to the characters does not translate to alphabetical (which is supposed to be the case with a case-sensitive, dictionary-based sort order is it not?). It is performing a sort, but also seems to be applying an additional, internal, 'group by' based on the first and last characters. Lastly, I then selected the data into a new table with the appropriate ASCII values in additional attributes/columns and then sorted the result set on the ASCII values. It now provides the expected sort order (because I am not sorting on the string itself): SELECT string_test, ASCII(SUBSTRING(string_test,1,1)) AS first_char, ASCII(SUBSTRING(string_test,2,1)) AS second_char, ASCII(SUBSTRING(string_test,3,1)) AS third_char into cbs_br8151_test4 FROM cbs_br8151_test3 WHERE string_test LIKE 'A%' SELECT string_test, ASCII(SUBSTRING(string_test,1,1)) AS first_char, ASCII(SUBSTRING(string_test,2,1)) AS second_char, ASCII(SUBSTRING(string_test,3,1)) AS third_char FROM cbs_br8151_test4 WHERE string_test LIKE 'A[aàáâãäå][ab]%' ORDER BY first_char, second_char, third_char 1. Aaa 65 97 97 2. Aab 65 97 98 3. Aàa 65 224 97 4. Aáa 65 225 97 5. Aáb 65 225 98 6. Aâb 65 226 98 7. Aãa 65 227 97 8. Aäa 65 228 97 9. Aäb 65 228 98 10. Aåa 65 229 97 What I need to be able to do is query the varchar attribute/column, sort on it directly (not conversions of the actual characters) and get the expected result set. Does anybody know how to do this, or have come across this problem before? It should be noted that we did not have a problem with the 'sort' that SQL Server provided before we started to input and use accented characters. If it is of any help, we have queried the database using both Query Analyzer and also a separate ODBC-based client (ODBC being the normal connection method for the application we are developing). Many thanks, Mark
Mark, In SQL's accent sensitive collations, the presence or absence of accented characters serves the purpose of a "tie-breaker" for otherwise identical strings. So, for example, all a's (accented or not) are expected to sort together in a group and to come before any b's. However, binary collations don't behave this way and will sort based on the numeric character code. HTH, Bart ------------ Bart Duncan Microsoft SQL Server Support Please reply to the newsgroup only - thanks. This posting is provided "AS IS" with no warranties, and confers no rights. -------------------- Reply-To: "Mark Peck" <mark.peck@care.co.uk> From: "Mark Peck" <mark.peck@care.co.uk> Subject: Collation Sequences, Accented Characters and ORDER BY clauses Date: Mon, 19 Jul 2004 15:08:06 +0100 Lines: 133 Organization: Care Business Solutions Ltd X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2800.1437 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 Message-ID: <eUzPipZbEHA.1356@TK2MSFTNGP09.phx.gbl> Newsgroups: microsoft.public.sqlserver.odbc NNTP-Posting-Host: host217-46-206-17.in-addr.btopenworld.com 217.46.206.17 Path: cpmsftngxa06.phx.gbl!TK2MSFTNGXS01.phx.gbl!TK2MSFTNGXA05.phx.gbl!TK2MSFTNG P08.phx.gbl!TK2MSFTNGP09.phx.gbl Xref: cpmsftngxa06.phx.gbl microsoft.public.sqlserver.odbc:42275 X-Tomcat-NG: microsoft.public.sqlserver.odbc Hi, Any help with the following query would be much appreciated. We are developing an application in VB6 for SQL Server 2000 (SP3), running on Win2k and have come across the following issue: We have a 30 character attribute/column (varchar) in which we store surnames. We now have some records where one or more characters within some of the surnames contain accented characters. We issue queries against the table and order by the surname, expecting an alphabetical sort. However, we are finding some very odd results with the accented values and how SQL Server 'sees' them. We typically use a collation of SQL_Latin1_General_CP1_CS_AS. Because of our findings, we created a test table that contains one attribute, string_test varchar(30), NULLS allowed, no index. We then populated the table with 10K records based on the following: 'A' + randomly generated ASCII character + randomly generated alphabetic character We then ran the following SQL: SELECT * FROM cbs_br8151_test3 WHERE string_test LIKE 'A [aàáâãäå][ab]%' The result we got was: 1. Aaa 2. Aàa 3. Aáa 4. Aãa 5. Aäa 6. Aåa 7. Aab 8. Aáb 9. Aâb 10. Aäb When viewing the first 5 records in isolation, the sort order appears correct. However, when looking at all 10 records, the sort order seems to be based on the characters and then additionally grouped on the first and last character. We think the sort order should have produced: 1. Aaa 2. Aab 3. Aàa 4. Aáa 5. Aáb 6. Aãa 7. Aäa 8. Aäb 9. Aåa 10. Aâb Further testing has shown that a binary-based collation sequence produces the same sort order as our previous query (using the CP1_CS_AS collation sequence). So changing the collation sequence in use doesn't seem to make a big difference, if any at all. However, we don't think that is right either. To prove the point, I then queried for the ASCII conversion of each of the characters: SELECT string_test, ASCII(SUBSTRING(string_test,1,1)) AS first_char, ASCII(SUBSTRING(string_test,2,1)) AS second_char, ASCII(SUBSTRING(string_test,3,1)) AS third_char FROM cbs_br8151_test3 WHERE string_test LIKE 'A[aàáâãäå][ab]%' ORDER BY string_test 1. Aaa 65 97 97 2. Aàa 65 224 97 3. Aáa 65 225 97 4. Aãa 65 227 97 5. Aäa 65 228 97 6. Aåa 65 229 97 7. Aab 65 97 98 8. Aáb 65 225 98 9. Aâb 65 226 98 10. Aäb 65 228 98 This proves that whatever conversion of the character data is used to base the 'sort' on, it isn't alphabetical, ASCII, or the binary values assigned to the characters does not translate to alphabetical (which is supposed to be the case with a case-sensitive, dictionary-based sort order is it not?). It is performing a sort, but also seems to be applying an additional, internal, 'group by' based on the first and last characters. Lastly, I then selected the data into a new table with the appropriate ASCII values in additional attributes/columns and then sorted the result set on the ASCII values. It now provides the expected sort order (because I am not sorting on the string itself): SELECT string_test, ASCII(SUBSTRING(string_test,1,1)) AS first_char, ASCII(SUBSTRING(string_test,2,1)) AS second_char, ASCII(SUBSTRING(string_test,3,1)) AS third_char into cbs_br8151_test4 FROM cbs_br8151_test3 WHERE string_test LIKE 'A%' SELECT string_test, ASCII(SUBSTRING(string_test,1,1)) AS first_char, ASCII(SUBSTRING(string_test,2,1)) AS second_char, ASCII(SUBSTRING(string_test,3,1)) AS third_char FROM cbs_br8151_test4 WHERE string_test LIKE 'A[aàáâãäå][ab]%' ORDER BY first_char, second_char, third_char 1. Aaa 65 97 97 2. Aab 65 97 98 3. Aàa 65 224 97 4. Aáa 65 225 97 5. Aáb 65 225 98 6. Aâb 65 226 98 7. Aãa 65 227 97 8. Aäa 65 228 97 9. Aäb 65 228 98 10. Aåa 65 229 97 What I need to be able to do is query the varchar attribute/column, sort on it directly (not conversions of the actual characters) and get the expected result set. Does anybody know how to do this, or have come across this problem before? It should be noted that we did not have a problem with the 'sort' that SQL Server provided before we started to input and use accented characters. If it is of any help, we have queried the database using both Query Analyzer and also a separate ODBC-based client (ODBC being the normal connection method for the application we are developing). Many thanks, Mark
Hi Bart, Many thanks for the reply - at least that explains what SQL Server is doing. Regards, Mark [quoted text, click to view] "Bart Duncan [MSFT]" <bartd@online.microsoft.com> wrote in message news:5iDeS0dbEHA.2900@cpmsftngxa06.phx.gbl... > Mark, > > In SQL's accent sensitive collations, the presence or absence of accented > characters serves the purpose of a "tie-breaker" for otherwise identical > strings. So, for example, all a's (accented or not) are expected to sort > together in a group and to come before any b's. However, binary > collations don't behave this way and will sort based on the numeric > character code. > > HTH, > Bart > ------------ > Bart Duncan > Microsoft SQL Server Support > > Please reply to the newsgroup only - thanks. > This posting is provided "AS IS" with no warranties, and confers no > rights. > >
BTW, a binary collation does sort the way you said you thought SQL should always sort. However, binary collations are also case sensitive so that may not meet your needs. drop table t1 create table t1 (c1 varchar (30) COLLATE Latin1_General_BIN) insert into t1 values ('Aaa') insert into t1 values ('Aàa') insert into t1 values ('Aáa') insert into t1 values ('Aãa') insert into t1 values ('Aäa') insert into t1 values ('Aåa') insert into t1 values ('Aab') insert into t1 values ('Aáb') insert into t1 values ('Aâb') insert into t1 values ('Aäb') Results: c1 ------------------------------ Aaa Aab Aàa Aáa Aáb Aâb Aãa Aäa Aäb Aåa Bart ------------ Bart Duncan Microsoft SQL Server Support Please reply to the newsgroup only - thanks. This posting is provided "AS IS" with no warranties, and confers no rights. -------------------- Reply-To: "Mark Peck" <mark.peck@care.co.uk> From: "Mark Peck" <mark.peck@care.co.uk> References: <eUzPipZbEHA.1356@TK2MSFTNGP09.phx.gbl> <5iDeS0dbEHA.2900@cpmsftngxa06.phx.gbl> Subject: Re: Collation Sequences, Accented Characters and ORDER BY clauses Date: Tue, 20 Jul 2004 09:35:06 +0100 Lines: 32 Organization: Care Business Solutions Ltd X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2800.1437 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 Message-ID: <O7blJUjbEHA.796@TK2MSFTNGP09.phx.gbl> Newsgroups: microsoft.public.sqlserver.odbc NNTP-Posting-Host: host217-46-206-17.in-addr.btopenworld.com 217.46.206.17 Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP09.phx.gbl Xref: cpmsftngxa06.phx.gbl microsoft.public.sqlserver.odbc:42280 X-Tomcat-NG: microsoft.public.sqlserver.odbc Hi Bart, Many thanks for the reply - at least that explains what SQL Server is doing. Regards, Mark [quoted text, click to view] "Bart Duncan [MSFT]" <bartd@online.microsoft.com> wrote in message news:5iDeS0dbEHA.2900@cpmsftngxa06.phx.gbl... > Mark, > > In SQL's accent sensitive collations, the presence or absence of accented > characters serves the purpose of a "tie-breaker" for otherwise identical > strings. So, for example, all a's (accented or not) are expected to sort > together in a group and to come before any b's. However, binary > collations don't behave this way and will sort based on the numeric > character code. > > HTH, > Bart > ------------ > Bart Duncan > Microsoft SQL Server Support > > Please reply to the newsgroup only - thanks. > This posting is provided "AS IS" with no warranties, and confers no > rights. > >
Don't see what you're looking for? Try a search.
|