This page is moving to http://carey.geek.nz/doc/xslt-cdata-escaping/. The version here will disappear eventually.

Moveable Type XSLT

By Carey Evans

This document elaborates on my ideas about XSLT in response to Kevin Davis’s experiments with Movable Type at Alazanto. (It also gave me a chance to play around with RDF and metadata that nobody will ever see.)

Be warned that really understanding this document will required a good knowledge of XML and XSLT, although I have tried to make the explanation and examples as clear as possible.

The Problem

Kevin has been experimenting with using XSLT to format simple XML output from Movable Type into a complete web page. Originally, he included the data for each weblog entry in a CDATA section containing literal XHTML, in much the same way as many RSS feeds. See the following example, reformatted for clarity:

<entry>
 <title>entry with images</title>
 <date>August 09, 2003</date>
 <author>Kevin</author>
 <idnum>000033</idnum>
 <permalink>http://alazanto.org/xml/archives/000033.xml</permalink>
 <body xmlns:html="http://www.w3.org/1999/xhtml"><![CDATA[<p><img
  class="archive" align="right" src="http://alazanto.org/images/sample.jpg"
  alt="photograph of a flower, just for show"/>Mauris felis elit, varius
  quis, pulvinar vel, sodales vehicula, mi. Nunc elementum pharetra elit.
  </p>]]>
 </body>
 <more xmlns:html="http://www.w3.org/1999/xhtml"><![CDATA[]]></more>
 <comment-link>http://alazanto.org/xml/archives/000033_comments.xml</comment-link>
 <comment-count>6</comment-count>
</entry>

The XML CDATA markup indicates that the data between <![CDATA[ and ]]> should not be interpreted as XML with elements and entity references resolved. Instead, the data is included as a literal string, exactly as if each <, > and & had been encoded as &lt;, &gt; and &amp; respectively. The result is a DOM tree like the following:

Note that in this DOM, the child text node of the <body> element is just a string, with no special meaning to an XML parser or an XSLT processor, even if it looks to you like a paragraph from an XHTML document.

We can write fairly simple XSLT templates to turn this XML into XHTML for the browser. To include the literal XHTML in the result, we can try the XSLT disable-output-escaping attribute, with a template something like this:

<xsl:template match="entry">
  <div class="entry">
    <h2><xsl:value-of select="title"/></h2>
    <xsl:value-of select="body" disable-output-escaping="true"/>
  </div>
</xsl:template>

Without the disable-output-escaping attribute, the string value of the <body> element would be written to the output so that it could be read in again by another XML parser. In other words, each < would be escaped as &lt;, each & as &amp;, and each > as &gt;.

When processed in Internet Explorer, or a stand-alone XLT processor, the disable-output-escaping attribute disables this escaping step, so that the text child of the <body> node is included literally in the output file as shown below, which is what Kevin expected:

<div class="entry">
 <h2>entry with images</h2>
 <p><img class="archive" align="right"
  src="http://alazanto.org/images/sample.jpg"
  alt="photograph of a flower, just for show"/>Mauris felis elit,
 varius quis, pulvinar vel, sodales vehicula, mi. Nunc elementum
 pharetra elit... </p>
</div>

The problem occurs when trying to use the same templates in Mozilla. The Mozilla XSLT processor doesn’t support disable-output-escaping, since it transforms directly from the source DOM to a destination DOM tree, without an output step in which to disable escaping. The DOM that Mozilla constructs is quite predictable, but not what Kevin wanted:

This means that Mozilla displays the markup to the user, complete with <p> and <img> tags, instead of the paragraph text with a floating image. Mozilla bug 98168 is about this behaviour, and comment 11 states quite clearly that it is expected and will not be changed.

The Solution

The solution for Kevin is to create the original XML file without encloding the paragraph in a CDATA section, making the image and paragraph tags real elements in the source XML DOM, so that they can be copied directly to the destination XHTML DOM. This small change to the source XML gives us a very different source DOM tree:

With this input, the XSLT to copy the nodes can be just as simple, using xsl:copy-of to copy all the elements under the source <body> element, but not the element itself:

<xsl:template match="entry">
 <div class="entry">
  <h2><xsl:value-of select="title"/></h2>
  <xsl:copy-of select="body/*"/>
 </div>
</xsl:template>

By copying elements instead of literal XHTML source code, Mozilla displays the page from the correct DOM tree, and it works just as well with Internet Explorer and external XSLT processors. The result looks very similar to the input DOM:

You can see the result of this in Kevin’s example XML weblog, in any web browser that supports XSLT.

As an aside, I'd like to thank Kevin for using the <xsl:copy-of> element after looking at an earlier version of this element. Without seeing it used, I might have gone another four years without knowing about such a useful tool, and gone on writing code like this:

<xsl:template match="body|p|img|li">
  <xsl:copy>
    <xsl:for-each select="@*"><xsl:copy/></xsl:for-each>
    <xsl:apply-templates select="p|img|li|text()"/>
  </xsl:copy>
</xsl:template>

Further Reading

This work is licensed under a Creative Commons License. Valid HTML 4.01! Valid CSS! Level Double-A conformance icon, W3C-WAI Web Content Accessibility Guidelines 1.0