Groups | Blog | Home
all groups > dotnet xml > march 2005 >

dotnet xml : newbie question - replacing "invalid" nodes


Nick Wong
3/28/2005 4:11:52 PM
Hi all,

i am reading in an xml stream and validating it against a given schema.
the objective is to "mark" "invalid" nodes (according to the xsd type
defined, or some rules) with an attribute, and then pass this modified
stream to another process in the pipeline.

as an example,
<bk:book publisher="Addison Wesley">
<bk:title>Mythical Man Month</bk:title>
<bk:author>Frederick Brooks</bk:author>
<bk:quantity>AAAA</bk:quantity>
</bk:book>

should become:
<bk:book publisher="Addison Wesley" valid="false">
<bk:title>Mythical Man Month</bk:title>
<bk:author>Frederick Brooks</bk:author>
<bk:quantity valid="false">AAAA</bk:quantity>
</bk:book>

note that the <bk:quantity ...> element now has a new 'valid' attribute set
to 'false'.

what i have tried to do is a combination of XmlValidatingReader (for
performing validation) with XmlDocument (to update the node, if required);
however, i cannot seem to "synchronize" the reader with the xmldocument.

the code i am using is as follows:

...

// Start the validating reader
vr = new System.Xml.XmlValidatingReader(r);
// ... schema initialization code

// Handle the xml document
doc.Load(vr);

// Reads and validates
while (vr.Read())
{
if (vr.NodeType==System.Xml.XmlNodeType.EndElement)
{
string name = vr.Name;
// Check if error has happened; _hasError is set in the validation
event handler
if (_hasError)
{
// Code here does not get executed as the XmlDocument has already
perform the "read" during Load call
System.Diagnostics.Debug.WriteLine("*** ERROR encountered!!! ***");
//System.Xml.XmlNode node = doc.SelectSingleNode(name);
//System.Diagnostics.Debug.WriteLine("ERROR!!! - node: " +
node.Name);
_hasError = false;
}
}
}



i also cannot help but wonder if i am using the wrong strategy to solve this
problem, or if i am over-engineering a complicated solution for a simple
problem. therefore, any help to enlighten will be very, very much
appreciated!

Thanks,
Nick Wong

Martin Honnen
3/28/2005 4:34:32 PM


[quoted text, click to view]


[quoted text, click to view]

I am not sure it is a good idea as obviously the new attribute inserted
(e.g. valid="false") is in itself making the document invalid.

Anyway, I think one way to solve that is to make use of two different
event mechanisms in .NET, the ValidationEventHandler, and the DOM XML
document mutation events the XmlDocument exposes. Here is a simple
example that manages to "mark" invalid element nodes, at least as they
have simple content that is not valid:

using System;
using System.Xml;
using System.Xml.Schema;

public class Test2005032802 {
private XmlElement lastInsertedElement;
private XmlDocument xmlDocument;
private string xmlURL;
private bool valid;
private string lastElementName;
private bool lastElementValid;
private XmlValidatingReader xmlValidator;

public static void Main (string[] args) {
Test2005032802 test = new Test2005032802(args[0]);
test.Load();
}

public Test2005032802 (string url) {
xmlURL = url;
}

public void Load () {
xmlDocument = new XmlDocument();
xmlDocument.PreserveWhitespace = true;
xmlDocument.NodeInserted += new
XmlNodeChangedEventHandler(NodeInsertedHandler);
xmlValidator = new XmlValidatingReader(new XmlTextReader(xmlURL));
xmlValidator.ValidationEventHandler += new
ValidationEventHandler(ValidationHandler);
valid = true;
lastElementValid = true;
lastElementName = "";
Console.WriteLine("Beginning validation:");
xmlDocument.Load(xmlValidator);
Console.WriteLine("Validaton finished: XML document is {0}.", valid
? "valid" : "not valid");
Console.WriteLine("Final OuterXml:");
xmlDocument.Save(Console.Out);
}

void NodeInsertedHandler (object sender, XmlNodeChangedEventArgs args) {
XmlNode currentlyInserted = args.Node;
Console.WriteLine("Node changed with action {0} and node {1} with
type {2} and name {3} and value {4}.", args.Action, currentlyInserted,
currentlyInserted.NodeType, currentlyInserted.Name,
currentlyInserted.Value);
if (currentlyInserted.NodeType == XmlNodeType.Element &&
lastElementName == currentlyInserted.Name &&
!lastElementValid)
{
lastInsertedElement = (XmlElement) currentlyInserted;
lastInsertedElement.SetAttribute("valid", "false");
lastElementName = "";
lastElementValid = true;
}
}

void ValidationHandler (object sender, ValidationEventArgs args) {
Console.WriteLine("Validation {0}: {1}.", args.Severity, args.Message);
if (args.Severity == XmlSeverityType.Error) {
valid = false;
if (xmlValidator.NodeType == XmlNodeType.EndElement) {
lastElementName = xmlValidator.Name;
lastElementValid = false;
}
}
}
}

For instance with the schema being

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
version="1.0">

<xs:element name="gods">
<xs:complexType>
<xs:sequence>
<xs:element name="god" type="xs:NCName" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>

</xs:schema>

and the XML instance being

<?xml version="1.0" encoding="UTF-8"?>
<gods
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="test2005032801Xsd.xml">

<god>Kibo</god>
<god>1Xibo</god>
<god>Jaffo</god>
<god>-Maho</god>

</gods>

the output of the XML at the end is

<gods xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="test2005032801Xsd.xml">

<god>Kibo</god>
<god valid="false">1Xibo</god>
<god>Jaffo</god>
<god valid="false">-Maho</god>

</gods>


But be aware that the code makes certain assumptions on the flow of
events in .NET, based on some test examples, there is no documentation
saying in what way validation and mutation events interact so you have
to look at the current event flow and then write code based on
observations in the current .NET version.

The main observation I have made is that the XmlValidatingReader is
positioned on an XmlNodeType.EndElement when reporting an validation
error and that that element is inserted next into the document tree.
That way you are able to "mark" the node with an attribute.

I have however not tried any schema with elements having complex types
and validation errors in relation to the defined structure of the
element, more logic would need to be added to handle that.



--

Martin Honnen
Nick Wong
3/28/2005 11:20:19 PM
Hi Martin,

Thanks for the assist -- your reply worked perfectly for my intended
purpose. i noticed as well that the validation event is fired on an end
element, but your technique of intercepting the NodeInserted event was the
breakthrough that got the problem solved -- brilliant!

the only trade-off in the workaround in terms of code structure is the need
to expose the validating xml reader, last node names, etc. as internal
variables. (i come from the winforms world and typically avoid accessing
shared fields from event handlers.)

based on this and other similar feedback i see on this newsgroup, i wonder
if the XmlValidatingReader that be enhanced so that, while it continues to
serve as a lightweight proxy object for reading & validating xml streams,
can provide richer error feedback mechanism (e.g., allow a mapping to the
DOM structure based on the error line and character number?)

again, thanks very much for the help, martin. i learnt much today.

Nick Wong

[quoted text, click to view]

AddThis Social Bookmark Button