Groups | Blog | Home
all groups > dotnet xml > june 2004 >

dotnet xml : reading inner html with xpath


Derek Harmon
6/27/2004 6:11:03 PM
[quoted text, click to view]

Put them in CDATA sections?

: :
[quoted text, click to view]

Like the InnerXml property?

[quoted text, click to view]

It's difficult (sometimes nearly impossible) to match and manipulate arbitrary mixed
content with XPath, although I think you're referring to the XPathNavigator class.

Of course, there's nothing to prevent you from walking the navigator and serializing
it directly (a healthy aversion to angle brackets aside). Here's a class I created that
encapsulates walking a navigator:

- - - XPathWalker.cs
using System;
using System.IO;
using System.Xml;
using System.Xml.XPath;

class NavigateEventArgs : EventArgs
{
public string NamespaceURI;
public string LocalName;
public string Prefix;
public string Value;

public NavigateEventArgs( string prefix, string localName, string nsUri, string valueStr)
{
this.Prefix = prefix;
this.LocalName = localName;
this.NamespaceURI = nsUri;
this.Value = valueStr;
}
}

class CompleteEventArgs : EventArgs
{
}

delegate bool MoveFirstDelegate( );
delegate bool MoveNextDelegate( );
delegate void NavigateEventHandler( object sender, NavigateEventArgs args);
delegate void CompleteEventHandler( object sender, CompleteEventArgs args);

internal class AxisWalker
{
private enum Status
{
Initial = 0, NoContent = 1, Active = 2
}
private MoveFirstDelegate moveFirst;
private MoveNextDelegate moveNext;
private AxisWalker.Status walkState;
protected XPathNavigator navigator;
public bool IsEmpty;

public event NavigateEventHandler Navigate;
public event CompleteEventHandler Complete;

public AxisWalker( MoveFirstDelegate firstFunc, MoveNextDelegate nextFunc)
{
this.navigator = firstFunc.Target as XPathNavigator;
this.moveFirst = firstFunc;
this.moveNext = nextFunc;
this.walkState = Status.Initial;
}

protected virtual void OnNavigate( NavigateEventArgs args)
{
if ( null != this.Navigate )
this.Navigate( this, args);
}

protected virtual void OnComplete( CompleteEventArgs args)
{
if ( null != this.Complete )
this.Complete( this, args);
}

public bool Step( )
{
if ( this.walkState == Status.Initial )
{
if ( this.moveFirst( ) )
{
this.walkState = Status.Active;
this.OnNavigate(
new NavigateEventArgs(
this.navigator.Prefix,
this.navigator.LocalName,
this.navigator.NamespaceURI,
this.navigator.ToString( )
)
);
}
else
{
this.IsEmpty = true;
this.walkState = Status.NoContent;
this.OnComplete( new CompleteEventArgs( ));
}
}
else
{
if ( this.moveNext() )
{
this.OnNavigate(
new NavigateEventArgs(
this.navigator.Prefix,
this.navigator.LocalName,
this.navigator.NamespaceURI,
this.navigator.ToString( )
)
);
}
else
{
this.walkState = Status.NoContent;
this.OnComplete( new CompleteEventArgs( ));
}
}
return ( this.walkState != Status.NoContent );
}

public void Walk( )
{
while ( this.Step( ) )
{
;
}
}

public void Reset()
{
this.walkState = Status.Initial;
this.IsEmpty = false;
}
}

internal class NamespaceAxisWalker : AxisWalker
{
public NamespaceAxisWalker( MoveFirstDelegate firstFunc, MoveNextDelegate nextFunc) : base( firstFunc, nextFunc) { ; }

protected override void OnNavigate( NavigateEventArgs args)
{
// There will be xml xmlns namespace decls on each element, although
// for most purposes, they are unnecessary in the serialization, so I
// discard them with this if-statement.
//
if ( "xml" != this.navigator.Name )
{
// Rearrange the arguments b/c Namespace axis has 'prefix' in .Name property,
// and .NamespaceURI in it's text value representation.
//
base.OnNavigate(
new NavigateEventArgs(
this.navigator.Name,
String.Empty,
args.Value,
String.Empty
)
);
}
}
}

public class XPathWalker
{
private AxisWalker elementWalker;
private AxisWalker attributeWalker;
private NamespaceAxisWalker nsWalker;
private XPathNavigator navigator;
private TextWriter sink;

public XPathWalker( XPathNavigator nav, TextWriter writer)
{
navigator = nav;
sink = writer;
elementWalker = new AxisWalker(
new MoveFirstDelegate( navigator.MoveToFirstChild),
new MoveNextDelegate( navigator.MoveToNext)
);
attributeWalker = new AxisWalker(
new MoveFirstDelegate( navigator.MoveToFirstAttribute),
new MoveNextDelegate( navigator.MoveToNextAttribute)
);
nsWalker = new NamespaceAxisWalker(
new MoveFirstDelegate( navigator.MoveToFirstNamespace),
new MoveNextDelegate( navigator.MoveToNextNamespace)
);

elementWalker.Navigate += new NavigateEventHandler( this.NavigateElement);
elementWalker.Complete += new CompleteEventHandler( this.CompleteElement);
nsWalker.Navigate += new NavigateEventHandler( this.NavigateNamespace);
nsWalker.Complete += new CompleteEventHandler( this.CompleteNamespace);
attributeWalker.Navigate += new NavigateEventHandler( this.NavigateAttribute);
attributeWalker.Complete += new CompleteEventHandler( this.CompleteAttribute);
}

public void Walk( )
{
navigator.MoveToParent( );
elementWalker.Walk( );
}

Lawrence Oluyede
6/27/2004 7:43:26 PM

Is there a way to treat html tags like simple text?
I explain myself, if I have a bunch of xml like

<content type="application/xhtml+xml"
xml:base="http://loluyede.blogspot.com" xml:lang="en-US"
xml:space="preserve">
<div xmlns="http://www.w3.org/1999/xhtml">I have 3 accounts to give away,
let me know if you want them</div>
</content>

the XPathNavigator.Value obiously returns "I have 3 accounts blah blah", is
there a way to have all the inner stuff as text if the navigator is
positioned on the <content> tag? Like

"<div xmlns="http://www.w3.org/1999/xhtml">I have 3 accounts to give away,
let me know if you want them</div>"

It's strange (and XPath doesn't provide such feature AFAIK) but it could be
useful

--
Lawrence
" It's probably for the best. What use does Microsoft have for an employee
who knows the W3C standards?"
Lawrence Oluyede
6/28/2004 10:10:42 AM
In data Sun, 27 Jun 2004 18:11:03 -0400, Derek Harmon ha scritto:

[quoted text, click to view]

Yeah, I could in the generating step, but I have to parse :)

[quoted text, click to view]

Yeah!

[quoted text, click to view]

very cool class, thanks!

[quoted text, click to view]

I'll take into consideration both of them, thanks a lot :)

--
Lawrence
" It's probably for the best. What use does Microsoft have for an employee
who knows the W3C standards?"
Lawrence Oluyede
6/28/2004 11:59:47 AM
In data Mon, 28 Jun 2004 12:37:18 +0200, Oleg Tkachenko [MVP] ha scritto:

[quoted text, click to view]

Great!

[quoted text, click to view]

God bless MVPs ;)

Thanks Oleg!

--
Lawrence
" It's probably for the best. What use does Microsoft have for an employee
who knows the W3C standards?"
Oleg Tkachenko [MVP]
6/28/2004 12:37:18 PM
[quoted text, click to view]

That's unfortunate omission in .NET 1.0/1.1. It's fixed in .NET 1.2

Meanwhile you can use something as simple as SerializableXPathNavigator
(http://www.tkachenko.com/blog/archives/000155.html) or
XPathNavigatorReader
(http://weblogs.asp.net/cazzu/archive/2004/04/19/115966.aspx). Both of
them belong to Mvp.Xml project (http://sf.net/projects/mvp-xml).

--
Oleg Tkachenko [XML MVP]
AddThis Social Bookmark Button