Groups | Blog | Home
all groups > dotnet xml > march 2005 >

dotnet xml : Pound sign (£) in XML


Waldy
3/14/2005 2:14:36 PM
Hi there,
I am using the .Net XML Serialization classes to create XML
strings. This has been working fine up until the point that one of the
strings contained a pound sterling symbol. The application that is
processing the output complains about the character. Both the strings and
the application are using UTF-8. If you view the text in Notepad the pound
symbol looks fine, but if you view the hex, there is a character preceding
the pound sign like so: £. Why does it not get converted to £ ?

Waldy
3/14/2005 3:11:53 PM

[quoted text, click to view]

So even though I am using a string writer encoded to UTF-8, the fact that I
assign it to a string is enough to set it back to UTF-16?

ie:

public override string GetData()

{

string strData = null;

....

XmlSerializer serialiser = new XmlSerializer(typeof(@event));

TextWriter textWriter = new
StringWriterWithEncoding(System.Text.Encoding.UTF8);

XmlWriter writer = new XmlTextWriter(textWriter);

serialiser.Serialize(writer, tp1Event);

strData = textWriter.ToString();

return strData;

}



Oleg Tkachenko [MVP]
3/14/2005 4:31:50 PM
[quoted text, click to view]

This actually looks like UTF-16. Strings in .NET are always UTF-16 encoded.

[quoted text, click to view]
What for? XML is capable to contain any character from any language in
the whole world, what's the point to escape characters increasing XML
document size? You better fix your encoding issue instead.

--
Oleg Tkachenko [XML MVP, MCP]
Waldy
3/14/2005 4:44:01 PM

[quoted text, click to view]

And put it into a byte array instead of a string?

Oleg Tkachenko [MVP]
3/14/2005 5:56:18 PM
[quoted text, click to view]

Strings are always in UTF-16. And so you can't change encoding for
StringWriter class - it's always UTF-16.

[quoted text, click to view]

AFAIR this class is a particlular hack which helps to produce UTF-16
encoded string containing different encoding in XML declaration. I
wouldn't recommend using it, just google for it to see why.
If you need other than UTF-16 encoding, use MemoryStream instead.

--
Oleg Tkachenko [XML MVP, MCP]
Oleg Tkachenko [MVP]
3/15/2005 12:33:37 PM
[quoted text, click to view]

Yep. Or leave it in UTF-16 and encode to UTF-8 only when serializing to
file or whatever.

--
Oleg Tkachenko [XML MVP, MCP]
AddThis Social Bookmark Button