Thanks Peter that is a real good reply you gave me. That helps in many ways.
"Peter Flynn" <peter.nosp@m.silmaril.ie> wrote in message
news:509k63F1etm73U1@mid.individual.net...
> JohnAD wrote:
>> Hello NG,
>>
>> I am getting some information from DB, and that data has mix html and XML
>> tags in the content (e.g. detail on country).
>>
>> Basically CDATA types are mixed with regular string. Also, html tags are
>> in escape form (e.g. > is >). When I display that string I see those
>> tags.
>
> This sounds like someone has interfered with the file.
>
>> Basically I am getting all this data as xml form and I want to find out
>> how can I change those html tags into regular tags,
>
> What does that mean? Change <p> back into <p>?
>
>> and also how to remove CDATA or any instructions in the string. Is there
>> a quick way to do that? My problem is increased as I don't know XML.
>
> It sounds like whoever supplied you with the file doesn't know any XML
> either.
>
> a) Best move is to ask them for valid (or at least well-formed) XML to
> start with. Unless you're working with well-formed data at the very
> least, you don't stand much chance of using XML. If you don't know
> if what you've got is well-formed or not, install a reliable
> standalone XML parser like rxp and use it to test the file[s].
>
> b) To change the escaped pointy brackets back into real ones you'll need
> to write and run some non-XML script, but the risk is that they were
> escaped for a reason (usually ignorance, sometimes laziness) and that
> by putting them back they way they were, you'll break the data model.
> By restoring them, you are essentially adding new elements to a file
> which wasn't designed to hold them (which is why they were escaped to
> begin with). It *is* possible to repair the damage with XSLT, but its
> string-handling isn't very sophisticated.
>
> c) CDATA markup is used along with HTML escapement to allow the remains
> of the elements to be embedded in XML, in the (usually) forlorn hope
> that someone (you) will struggle to restore them at a later stage in
> the process. This is often done by people with little understanding
> of markup or XML (your supplier). Running the document through any
> parsing XML processor will automatically remove the CDATA markup and
> pass the content through to whatever the next stage is. However, if
> doing so reveals pointy-bracket markup that doesn't fit the document
> model (DTD, Schema,...) then the process will halt (as it's supposed
> to).
>
> Have a look at
http://xml.silmaril.ie/authors/cdata/ and
>
http://xml.silmaril.ie/authors/html/ >
> And do try (a) if at all possible: it will make your life, your supplier's
> life, and the life of the information very much easier.
>
> ///Peter
> --
> XML FAQ:
http://xml.silmaril.ie/