all groups > dotnet xml > january 2007 >
You're in the

dotnet xml

group:

XML CDATA etc


XML CDATA etc JohnAD
1/5/2007 8:04:30 PM
dotnet xml:
Hello NG,

I am getting some information from DB, and that data has mix html and XML
tags in the content (e.g. detail on country).

Basically CDATA types are mixed with regular string. Also, html tags are in
escape form (e.g. > is >). When I display that string I see those tags.

Basically I am getting all this data as xml form and I want to find out how
can I change those html tags into regular tags, and also how to remove CDATA
or any instructions in the string. Is there a quick way to do that? My
problem is increased as I don't know XML.

Thank you,
Po
Re: XML CDATA etc Peter Flynn
1/6/2007 12:49:06 PM
[quoted text, click to view]

This sounds like someone has interfered with the file.

[quoted text, click to view]

What does that mean? Change &lt;p&gt; back into <p>?

[quoted text, click to view]

It sounds like whoever supplied you with the file doesn't know any XML
either.

a) Best move is to ask them for valid (or at least well-formed) XML to
start with. Unless you're working with well-formed data at the very
least, you don't stand much chance of using XML. If you don't know
if what you've got is well-formed or not, install a reliable
standalone XML parser like rxp and use it to test the file[s].

b) To change the escaped pointy brackets back into real ones you'll need
to write and run some non-XML script, but the risk is that they were
escaped for a reason (usually ignorance, sometimes laziness) and that
by putting them back they way they were, you'll break the data model.
By restoring them, you are essentially adding new elements to a file
which wasn't designed to hold them (which is why they were escaped to
begin with). It *is* possible to repair the damage with XSLT, but its
string-handling isn't very sophisticated.

c) CDATA markup is used along with HTML escapement to allow the remains
of the elements to be embedded in XML, in the (usually) forlorn hope
that someone (you) will struggle to restore them at a later stage in
the process. This is often done by people with little understanding
of markup or XML (your supplier). Running the document through any
parsing XML processor will automatically remove the CDATA markup and
pass the content through to whatever the next stage is. However, if
doing so reveals pointy-bracket markup that doesn't fit the document
model (DTD, Schema,...) then the process will halt (as it's supposed
to).

Have a look at http://xml.silmaril.ie/authors/cdata/ and
http://xml.silmaril.ie/authors/html/

And do try (a) if at all possible: it will make your life, your
supplier's life, and the life of the information very much easier.

///Peter
--
Re: XML CDATA etc JohnAD
1/7/2007 12:38:23 AM
Thanks Peter that is a real good reply you gave me. That helps in many ways.
Thanks again.



[quoted text, click to view]
AddThis Social Bookmark Button