Groups | Blog | Home
all groups > dotnet xml > october 2006 >

dotnet xml : XmlValidatingReader in v1.1



Phil Hobgen
10/31/2006 12:12:02 AM
Hi,

I am using the XmlValidatingReader Class in VS.Net 2003 (targeting dotNet
v1.1) to validate an xml message against a set of schemas.

Within the schema a type is defined as follows

<xs:simpleType name="atypename">
<xs:restriction base="xs:token">
<xs:pattern value="\w{1,6}"/>
</xs:restriction>
</xs:simpleType>

The "\w" construct should allow all characters except the set of
"punctuation", "separator" and "other" characters.

The W3C documentation indicates that the underscore character is punctuation
and should therefore be excluded. However a validation event is not raised
when the content has an undescore character in it. I think this is probably a
result of the fact that in the Unicode recommendation it says that "\w"
should allow underscores because of its common use in programming languages.
However, I would have thought that XmlValidatingReader would follow the W3C
recommendation?

I can't see this listed as a known bug anywhere. Is this because it is not
seen as a bug?

Could someone tell me, if I change to use dotNet v2.0 will this behave in
the way recommended by the W3C or is the behaviour the same as in dotNet v1.1?

Many thanks



--
Phil Hobgen
Martin Honnen
10/31/2006 4:54:19 PM
[quoted text, click to view]



[quoted text, click to view]

Interesting, recently someone run into the problem with \w including the
"_" in some regular expression languages in programming
languages/libraries (e.g. JavaScript/ECMAScript, or the .NET framework
Regex class) but not in the XSD schema regular expression language. I
did not know about the Unicode recommendation. Do you happen to have a
link to that part?


[quoted text, click to view]

With .NET 2.0 with both the new XmlReader with the proper
XmlReaderSettings to validate and the (obsolete) XmlValidatingReader the
following does not validate:

<value>abc_de</value>

schema excerpt:

<xs:element name="value" maxOccurs="unbounded">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="\w{6}"/>
</xs:restriction>
</xs:simpleType>
</xs:element>

Validation error message
"Error: The 'value' element is invalid - The value 'abc_de' is
invalid according to its datatype 'String' - The Pattern constraint failed..

So with .NET 2.0 \w in a pattern follows the W3C XSD schema
specification (at least as far as not including "_" in \w).



--

Martin Honnen --- MVP XML
Phil Hobgen
11/1/2006 12:03:02 AM
[quoted text, click to view]
http://www.unicode.org/unicode/reports/tr18/#Tailored_Properties
then scroll down a few pages to
Annex C: Compatibility Properties
You'll see the recommendation for \w and the comments mention "_"

[quoted text, click to view]
Great, at least I now know it is worth moving the app up to .Net v2.0

Many thanks for the speedy reply Martin.

--
Phil Hobgen
AddThis Social Bookmark Button