The Ops Community ⚙️

Cover image for Bulk updating documents with XSLT
Andrew Owen
Andrew Owen

Posted on

Bulk updating documents with XSLT

This article originally appeared on my personal dev blog: Byte High, No Limit.

XSLT (Extensible Stylesheet Language Transformations) is a language for transforming XML documents into other documents. I've mentioned it before in my post on creating release notes from Jira.

Now that every document under the sun is either stored in XML or can easily be converted to XML, XSLT provides a great way to perform batch processing on those documents.

To use XSLT, you need an XSLT processor. If you stick to version 1.0 of the specification, there are lots of choices. But if you want support for the latest version (3.0 at time of writing) your choices are RaptorXML (integrated with Altova's XMLSpy XML editor), or Saxon. Because my preferred XML editor is XMLmind's XML Editor (XXE), I use Saxon. Saxon and XXE both have free editions for personal use and open source projects.

As an aside, I've been using XXE for over a decade. If you have to write docs in an XML format like DocBook or DITA, it's the only editor I would recommend. It has a WYSIWYG interface, keyboard shortcuts for everything and you rarely have to deal with tags.

When you use the processor to apply your XSLT declarations to a file, it creates a new output, leaving the original files unchanged. Although it's a special purpose language, XSLT is Turing-complete, so you can do any kind of computation you might need with it.

Here's an example I created to convert <b> and <span class="BodyWord"> tags to <span class="Emphasis"> tags in MadCap Flare. To use it, save it as emphasis.xsl in the path with your files and then from the command line, enter:\
saxon -it:main -xsl:emphasis.xsl.

<?xml version="1.0" encoding="UTF-8" ?>

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

<xsl:template match="b">
<xsl:element name="span">
<xsl:attribute name="class">Emphasis</xsl:attribute>
<xsl:value-of select="@*|node()"/>
</xsl:element>
</xsl:template>

<xsl:template match="@class\[.='BodyWord'\]">
<xsl:attribute name="class">Emphasis</xsl:attribute>
</xsl:template>

<xsl:template match="@*|node()">
<xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>

<xsl:template name="main">
<xsl:for-each select="collection('.?select=*.htm;recurse=yes')">
<xsl:result-document href="output/{tokenize(document-uri(.))}">
<xsl:apply-templates select="."/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>

</xsl:transform>
Enter fullscreen mode Exit fullscreen mode

And here's a slightly more complex example that converts the contents of headings (<h1>) to sentence case in MadCap Flare. To use it, save it as sentence.xsl in the path with your files and then from the command line, enter:\
saxon -it:main -xsl:sentence.xsl.

<?xml version="1.0" encoding="UTF-8" ?>

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

<xsl:template match="h1">
<xsl:element name="h1">
<xsl:value-of select="substring(upper-case(.),1,1)"/>
<xsl:value-of select="substring(lower-case(.),2)"/>
</xsl:element>
</xsl:template>

<xsl:template match="@*|node()">
<xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>

<xsl:template name="main">
<xsl:for-each select="collection('.?select=*.htm;recurse=yes')">
<xsl:result-document href="output/{tokenize(document-uri(.))}">
<xsl:apply-templates select="."/>
</xsl:result-document>
</xsl:for-each>
</xsl:template>

</xsl:transform>
Enter fullscreen mode Exit fullscreen mode

You can learn more about XSLT at w3schools.com.

Top comments (0)