all groups > dotnet xml > november 2003 >
You're in the

dotnet xml

group:

System.Xml.XmlException: hexadecimal value is an invalid character


System.Xml.XmlException: hexadecimal value is an invalid character Todd
11/27/2003 8:41:43 AM
dotnet xml: Our ASP.NET (C#) application accepts form entry and saves=20
inputed data in XML.

We are finding that users are sometimes cutting and=20
pasting special characters (from MS Word) into these=20
forms. The data is saved successfuly, but when the XML is=20
later read the following error is encountered depending on=20
the invalid character found:

This is a sample:
System.Xml.XmlException: '=1A', hexadecimal value 0x1A, is=20
an invalid character.

I have ensured that the saved XML includes an encoding=20
declaration (<?xml version=3D"1.0" encoding =3D'UTF-8'?>). =20
Changing the encoding format does not effect the error=20
message.

The XML parser installed is MSXML 4.0.

Is there any way to handle the reading of these=20
characters, or any way to ensure these characters are=20
converted into something readable at the time the values=20
are written to the XML object?

Re: System.Xml.XmlException: hexadecimal value is an invalid character Todd
11/27/2003 9:12:35 AM

[quoted text, click to view]
help://MS.MSDNQTR.2003JUL.1033/cpguide/html/cpconwritingxml
withxmlwriter.htm
[quoted text, click to view]

This is how the XML is created:

protected void Application_Start(Object sender, EventArgs
e)
{
XmlDocument xml = new XmlDocument
();
StringBuilder sb = new
StringBuilder();
StringWriter sw = new StringWriter
(sb);
XmlTextWriter xmlText = new
XmlTextWriter(sw);
xml.LoadXml
(ConstantsXML.getInstance().GetValueFromKey
("ReportXmlTemplate"));
xml.WriteContentTo(xmlText);
Application["ReportXml"] =
sb.ToString();
}

Where the value for "ReportXmlTemplate" is:
&lt;?xml version='1.0' encoding='UTF-8'?
Re: System.Xml.XmlException: hexadecimal value is an invalid character Oleg Tkachenko
11/27/2003 6:55:45 PM
[quoted text, click to view]

How do you save that data? Show us how do you create that XML.
It could be the problem with XmlTextWriter, which doesn't check unicode
characters that do not fit the specified encoding and hence can produce
non well-formed XML. See "Customized XML Writer Creation" article [1] in
msdn how to handle the issue.

[1]
ms-help://MS.MSDNQTR.2003JUL.1033/cpguide/html/cpconwritingxmlwithxmlwriter.htm
--
Oleg Tkachenko
XML Insider
http://www.tkachenko.com/blog
Re: System.Xml.XmlException: hexadecimal value is an invalid character Dave Marteinson
11/28/2003 7:35:05 PM
Hi Todd,

[quoted text, click to view]

I'm not aware of anything that'll let you pick out these characters
from an instance and correct the problem. Always a tricky issue.
It's not clear to me that DOM Level 3 Validation will handle
this either, although I've just skimmed that.
(http://www.w3.org/TR/2003/CR-DOM-Level-3-Val-20030730/)
Your case is simpler since you say the encoding is under
your control. There aren't too many ranges that aren't allowed,
so perhaps you can just filter these out upstream.

From 2.2 of the rec "Characters":

[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]

In particular (re your sample error), if you see
anything < #x20 that's not a tab, lf or cr convert it or remove it.

Regards,

-djm

AddThis Social Bookmark Button