Groups | Blog | Home
all groups > dotnet xml > may 2004 >

dotnet xml : XslTransform/XmlWriter can't encode &#160 in us-ascii or iso-8859-1


smr NO[at]SPAM essemer.com.au
5/19/2004 4:47:00 AM
My problem starts with wanting "&#160" to actually appear that way in
the output rather than an actual encoded 0xA0 byte in the output
stream. I thought a way to solve this would be to select us-ascii or
iso-8859-1 as the output encoding, and so used this line in the
stylesheet:

<xsl:output method="html" encoding="iso-8859-1"/>

The XslTransform documentation states something like (I can't locate
it now) that the encoding is ignored unless used with a Stream or
TextWriter. I tried to use the XmlTextWriter, but the documentation
for it states:

If the Unicode characters do not fit the specified encoding,
the XmlTextWriter does not escape the Unicode characters into
character entities

What I end up getting is a '?' in place of the &#160. If I change the
encoding to utf-8 or utf-16 it encodes as it should.

Does anyone know of a workaround for this? I believe the processor
should transform even &#x5555 to &#5555 when the encoding cannot
handle the actual 16-bit value.

Regards,

smr NO[at]SPAM essemer.com.au
5/19/2004 5:07:07 PM
I've confirmed that the behaviour I'm getting is a bug. Does anyone
know of any email address or web form for submitting bug reports for
..NET?

I was inaccurate in my original post. If the output encoding is
iso-8859-1 the behaviour is as expected. It's when the output
encoding is us-ascii when the bug shows itself. In this situation, a
&#160; in the input is encoded as a single '?' character in the
output. In fact any extended ASCII character gets encoded as a '?'.
It's obviously their way of saying "you've tried to output a character
that can't be represented in this character set, so I'll put in a ? so
you can see where it all went wrong". This seems ok for an output
stream that knows nothing else about what is being output. However,
the XslTranslator does know what it is outputting, and what character
set it is outputting to, and how it could correctly represent such
characters in that character set, and thus it should be outputting the
characters "& # 1 6 0 ;".

Regards,

Jiho Han
5/20/2004 9:21:06 AM
MSFT guys monitor these newsgroups but if you want to be sure, try these
guys:

http://blogs.msdn.com/DareObasanjo/

http://blogs.msdn.com/mfussell

[quoted text, click to view]

smr NO[at]SPAM essemer.com.au
5/23/2004 7:48:00 PM
Hi, thank you for the references.

Regards,

Steven

[quoted text, click to view]
Oleg Tkachenko [MVP]
6/7/2004 10:28:08 AM
[quoted text, click to view]

That's not generally true. When transforming to XmlReader or XmlWriter,
xsl:output instruction is irrelevant and is ignored altogether (that's
ok according to XSLT spec). When transforming to TextWriter, its
encoding is used instead, so encoding attribute of xsl:output
instruction is ignored too.
The only situation when XslTransform class is fully in control over
output serialization is when transforming is done to Stream.

PS. But not outputting character references is still a bug IMO. Btw,
this can be easily worked around by writing custom XmlWriter, which can
take care of these characters.
--
Oleg Tkachenko [XML MVP]
Paul Hatcher
6/10/2004 5:34:31 PM
I've got a similar problem in that I can't stop the XmlTextWriter changing
'>' to &gt; even when it's in an xsl:text element with
disable-output-escaping set true.

Any idea?

Paul

[quoted text, click to view]

AddThis Social Bookmark Button