Groups | Blog | Home
all groups > dotnet xml > october 2005 >

dotnet xml : xml utf-8 String to XPathDocument


David Thielen
10/12/2005 7:19:01 PM
Hi;

I have a string that is an xml file. It starts with <?xml
encoding='utf-8'... and it has the utf-8 2-byte sequences as 2 chars. How do
I get that into an XPathDocument where the 2-char sequences are not treated
as 2 characters?

--
David Thielen
10/13/2005 6:49:09 AM
It's not a file, the xml is in a String. I tried StringReader but it didn't
handle it correctly.

--
thanks - dave


[quoted text, click to view]
v-kevy NO[at]SPAM online.microsoft.com
10/13/2005 8:25:19 AM
Hi dave,

You don't need to care about the encoding, just create an XPathDocment
object with the filename as the constructor's parameter. Or you can load
the file into a stream and open the XPathDocument from the stream.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
David Thielen
10/13/2005 9:22:07 PM
Hi;

No problem - here it is:
String data =
"<?xml version='1.0' encoding='utf-8'?>" +
"<order>" +
" <customer>" +
" <FLD>Angebot über eine neue Schmieranlage</FLD>" +
" </customer>" +
"</order>";

Please note that the ü is the 2 byte value for a utf-8 encoding that is
actually a ü. So I need those to char values to become 2 byte values when fed
to XmlDocument (new MemoryStream())

The best I have come up with is to create a byte[] and char by char assign
the String values to the byte. But there has to be a faster way (I hope).

--
thanks - dave


[quoted text, click to view]
v-kevy NO[at]SPAM online.microsoft.com
10/14/2005 3:23:55 AM
I don't quite understand. The Xml doc is UTF-8 encoding. When it is a
string variable, it is stored as Unicode in memory. So you needn't worry
about the encoding issue. Can you post a simple code with repro the error?

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
Chris Lovett
10/14/2005 9:48:19 AM
You cannot do utf-8 inside CLR strings. A CLR "char" is considered to
always be UTF-16. If you want to do UTF-8 you need to do it at the byte
level, not the "char" level. See
http://msdn.microsoft.com/library/en-us/dnxml/html/xmlencodings.asp for
details.

[quoted text, click to view]

David Thielen
10/14/2005 10:04:01 AM
Yes - but unfortunately I don't control how it is passed to me. So I have to
convert. I guess the for loop is my best solution.

--
thanks - dave


[quoted text, click to view]
Chris Lovett
10/17/2005 1:41:08 PM
I would say the string you've been given is terribly messed up if it
contains UTF-8 - I would push back on the source of this string and fix it
there.

[quoted text, click to view]

David Thielen
10/17/2005 1:55:07 PM
Hi;

Apparently what is happening is the xml file is being read in to a String.
Since they are just reading the text, they don't know the encoding. And when
I get the string, I also don't know the encoding unless I parse it to find
the encoding=.

So it is read with each byte in the original file becoming a char in the
string. And I then convert back with each char becoming a byte. It is messy -
but I'm not sure there is a better solution unless both ends parse the text
to find the encoding=, then reset the stream to then read it.

--
thanks - dave


[quoted text, click to view]
AddThis Social Bookmark Button