all groups > dotnet xml > august 2004 >
You're in the

dotnet xml

group:

Error when using XMLTextReader to read HTML


Error when using XMLTextReader to read HTML Mitch
8/26/2004 7:26:03 PM
dotnet xml:
I have some simple HTML I'm trying to read with the XMLTextReader. As in the
MSDS examples, I set up a loop to read each XML node:

while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
Console.WriteLine("<{0}>", reader.Name);
break;
case XmlNodeType.Text:
Console.WriteLine(reader.Value);
break;
case XmlNodeType.Attribute:
Console.WriteLine(reader.Value);
break;
default:
Console.WriteLine(reader.NodeType);
break;
}
}

The reader moves along fine until it attempts to read the </head> node. in
this html:
<html>
<head>
<title>Sir</title>
<meta name="Author" content="Bar01">
<meta name="Description" content="Instructions">
<link href="css/results.css" media="SCREEN" rel="StyleSheet"
type="text/css" />
</head>

The error is:

System.Xml.XmlException: The 'meta' start tag on line '5' does
not match the end tag of 'head'. Line 7, position 4.
at System.Xml.XmlTextReader.ParseTag()
at System.Xml.XmlTextReader.ParseBeginTagExpandCharEntities()
at System.Xml.XmlTextReader.Read()
at PIDProvider.Analyze.PIDrefs() in c:\vdev2\PID\Analysis.cs:line 29

What does that exception mean?

Am I missing something? Am I wrong to assume that I can read the HTML with
the XMLTextReader?

Thanks

Mitch

Re: Error when using XMLTextReader to read HTML Mitch
8/27/2004 9:19:28 AM
Thanks Martin. I also found something called Tidy HTML that converts HTML to
be well formed XML. In the end though, I decided to just use regex to find
the stuff I need because I'm really not interested in the over node
structure.

Mitch
[quoted text, click to view]

Re: Error when using XMLTextReader to read HTML Martin Honnen
8/27/2004 2:18:15 PM


[quoted text, click to view]

Yes, completely wrong, HTML is an SGML application and you can't parse
HTML with an XML parser unless you author XHTML.
If you want to read HTML there is an SGML reader implementation in .NET
around, google for it.

--

Martin Honnen
AddThis Social Bookmark Button