Groups | Blog | Home
all groups > dotnet framework > march 2008 >

dotnet framework : ASCII/UNICODE Encoding


Barry Kelly
3/28/2008 5:29:41 PM
[quoted text, click to view]

ASCII only covers 128 character points. What you're talking about there
is a code page like Latin 1 (ISO 8859-1) or Windows-1252.

Try this out:

---8<---
using System;
using System.Text;

class App
{
static void Main()
{
Encoding e = Encoding.GetEncoding("ISO-8859-1");
byte[] encoded = e.GetBytes("öäü");
foreach (byte b in encoded)
Console.Write("{0:X2} ", b);
Console.WriteLine();
}
}
--->8---

Prints out the following on my machine:

F6 E4 FC

-- Barry

--
Davide Piras
3/28/2008 5:59:35 PM
Hello,

..NET Framework 2, C#, ASP.NET web application...

I have the string "öäü" and I need the ascii rapresentation in the format:
"\u00F6" stands fo "ö" and so on...

then later I need to convert again from the string "\u00F6....." to the
original one "öäü"

tried some paths but cannot make the encoders and classes of the System.Text
namespace working :-(

what is the simplest way to do something like that?

at this moment are converting by hand with this:

(char)Convert.ToInt32(m.Value.Replace("\\u", ""), 16)

that casts \u00F6 back to ö

but I think there should be a better way...

Thanks, regards, Davide.
Mihai N.
3/28/2008 10:20:53 PM
[quoted text, click to view]

Since .NET strings are Unicode, going thru ISO-8859-1 might damage some
characters. And the \u (popular in Java .properties files) also suggests
that Davide really wants Unicode values.


--
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Barry Kelly
3/29/2008 12:33:01 PM
[quoted text, click to view]

Ah, now I think I understand: you're looking to create escapes for
characters outside 0-127. That ought to be fairly simple for most cases:
create a string builder, check each char value, compare it with 127, if
it's higher, AppendFormat("\\u{0:X4}", (int) charValue). rather than
just Append(charValue).

Similarly, on the other end, you'll have to scan for those \us. There
isn't any built-in functionality that does this, as far as I know. It
might make sense to code up this functionality as a subclass of
Encoding, for reusability reasons.

-- Barry

--
AddThis Social Bookmark Button