Groups | Blog | Home
all groups > dotnet xml > january 2005 >

dotnet xml : HTML 4.01 / XHTML implementation of the DOM


Edgardo Rossetto
1/30/2005 12:46:41 AM
Hi, got a few questions:

- Does anyone know a HTML 4.01 / XHTML implementation of the DOM, or is
only System.Xml (XML 1.0 and 2.0 only AFAIK) avaiable?
- Any ideas if .NET 2.0 will have?

Is it possible to "load" the DTD specification for HTML 4.01 / XHTML
using the System.Xml namespace or am I just dreaming?

Bjoern Hoehrmann
1/30/2005 8:29:16 AM
* Edgardo Rossetto wrote in microsoft.public.dotnet.xml:
[quoted text, click to view]

XHTML 1.0/1.1/Basic/Print documents are required to be XML 1.0 documents
so you can use any XML processor for those; HTML 4.01 is not supported
by the .NET Framework, but there are external parsers available, see the
archive of this newsgroup; System.Xml implements various DOM levels and
features, though it uses different language bindings, e.g., overloading
is used which is avoided in the W3C specifications as not all languages
support overloading. DOM Level 2 HTML is not supported by the Framework
except for the System.Web.UI.HtmlControls namespace. There is no XML 2.0
the W3C only specified XML 1.0 and XML 1.1, the framework supports XML
1.0 only.

[quoted text, click to view]

XHTML DTDs are XML DTDs and System.Xml supports XML DTDs e.g. for
validation; HTML DTDs are not supported, but external tools exist that
might offer the functionality you are looking for.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
Martin Honnen
1/30/2005 4:04:44 PM


[quoted text, click to view]


[quoted text, click to view]

If you are asking about the W3C DOM Level 1 or Level 2 HTML, no, at
least the classes provided by MS as the .NET SDK do not implement that,
I have never looked for third party implementations in .NET so I don't
know about that.
As for .NET's System.Xml and the W3C DOM standards, I think W3C DOM
Level 1 XML, W3C DOM Level 2 Core and XML are there, but as already
explained making use of overloading for instance which the .NET
framework supports while the W3C DOM has avoided it to allow bindings to
script languages like JavaScript(ECMAScript), and with some other
deviations (nodeValue in the W3C DOM is Value in the .NET SDK).
W3C DOM Level 2 mutation events also have some counterpart in the .NET
SDK with the NodeChanged, NodeChanging, NodeInserted, NodeRemoved etc
events the XmlDocument has.

[quoted text, click to view]

The XHTML DTDs are XML DTDs so System.Xml should be able to handle them.
There are also schemas for XHTML which System.Xml with its schema
support should be able to handle.
For reading HTML there is SgmlReader:
<http://www.gotdotnet.com/Community/UserSamples/Details.aspx?SampleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC>
You can use that to read in HTML (valid and "tag soup" as found on the
web) and then for instance create a .NET XmlDocument.

--

Martin Honnen
Edgardo Rossetto
1/31/2005 1:06:13 AM
[quoted text, click to view]

Could you pase some urls of these parsers please? anyone free or open
source?

thanks a lot for the clarifications

Bjoern Hoehrmann
1/31/2005 7:18:51 AM
* Edgardo Rossetto wrote in microsoft.public.dotnet.xml:
[quoted text, click to view]

Martin mentioned one, there are several .NET wrappers for HTML Tidy,
e.g. http://sourceforge.net/projects/ntidy/ and various parsers for
the Java programming language which you should be able to use through
either J# or the Java to C# Converter.

http://www.google.com/search?q=html+parser+c%23

has details.
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
AddThis Social Bookmark Button