Personal informationResearch areaTeaching and guiding studentsInteresting technologyLinks to interesting sites
Rick van Rein, PhD student:
[SELECT TOPIC]
Hmmm, the TAG generation is not quite right yet... Seems I have misunderstood the copy-of XSLT direction...

XML website generation

XML is a language for content description, which looks like HTML, but you are allowed to define your own tags to represent the content that you deem useful. In a year or so, most if not all browsers will probably be able to read XML documents and process them with style sheets in the XSLT language, but since I don't know a decent browser support this scheme yet, I currently generate HTML from my XML documents.

The HTML for this website is generated automatically from XML documents. Since HTML is a subset of XML, that's a bit fuzzy, but it comes down to mixing layout tags such as listing and webpage... into HTML. A stylesheet recognises such tags and maps them to standard HTML. For example, the tabs and top picture on this page (and all others) is generated as a result of the webpage... tag. If I want to change the tabs to vertical tabs at the right, and perhaps add the background lines of a college reader, I only have to change the stylesheet to generate another layout. The XML content stays the same. All the HTML files can then be generated anew, using the new stylesheet. This means that it is simpler to keep the website consistent in style without much trouble.

What is a stylesheet processor and what does it do?

First, what is a stylesheet processor? This is a program used to transform an XML document according to an XSLT stylesheet. This is the sort of thing that will be built into our browsers in the long run. Since this is not true for most browsers today, we have to run a batch-processing XSLT processor (which may be placed in cgi-bin if you desire online processing and don't mind your server to thrash under heavy load.

The stylesheet processor runs through a XML source document according to the directions in an XSLT style sheet. The XML document is presented to the style sheet as a tree structure of objects, and the XSLT runs through it in any way it likes. XSLT can describe depth-first or width-first traversals, it can jump to a place-of-definition, and it doesn't care if it visits the same XML document node more than once, possible with different intentions. While jumping back and forth between your document, the XSL processor generates output, which may be anything.

I have a lot of experience with XSLT, where it has proven quite fruitful. I have generated Perl scripts to fit into a cgi-bin interface, addressing an SQL database. I have a design project where four students generate user interfaces from conceptual descriptions. I have an MSc assignment to make it possible to do algebraic reasoning about stylesheets. I have generated LaTeX from HTML to print web pages in a much nicer way than Netscape does [unfinished]. And now, I am turning to the relatively simple task of generating HTML from layout-directing tags.

I don't define experimental languages with ad hoc syntax and yacc parsers anymore, but I describe the syntax in an XML schema. I don't build compilers anymore, but I describe their task in XSLT style sheets. Altogether, this saves me a lot of work and makes my work a lot easier to maintain!

How does a stylesheet look?

While I am typing this, I realise that I keep on typing sequences like CODElisting/CODE which is a lot of work; so as I go, I changed that to tag and a little extension to my stylesheet. This is normal for people accustomed to LaTeX, but how about the Word-users out there? To generate the code, I need to enter the following in my stylesheet:

	<xsl:template match="tag">
	  <CODE>&lt;<xsl:apply-templates/>&gt;</CODE>
	</xsl:template>
Which states the following:
  • The part xsl:template refers to a tag (in XML-terms, an element) named template, taken from the xsl namespace, which is the collection of elements (tags) for stylesheet specification. You could say the XSLT is defined by this namespace.
  • The part match="tag" indicates a rule that will become active for elements tag.
  • The part between xsl:template match="tag" and /xsl:template is the output constructed instead of the tag part.
  • Since the source document may contain text as well as other elements encapsulated between tag and /tag, it is possible to generate those contents somewhere inside this pattern; this is done with xsl:apply-templates/.
  • The CODE tags and the &lt; and &gt; (escaped less than and greater than symbols) are the HTML generated from this stylesheet.

Defined styles in my stylesheet

I recommend you to think over the elements you wish to include in your stylesheet. When defining those, you are actually defining a class diagram, or database schema, or whatever you like to call it. Give it proper thought. Search for places of maintenance, and try to localise it in one logical spot.

webpage

This element defines the whole webpage, and will create all the layout around it. Since I am generating the HTML pages, it is not a problem to generate similar things in many of the HTML output, as long as the XML input contains no dublicated information [that'd be bad for maintenance]. It currently constructs a nested set of rectangles, enforced through tables, and it places some images inside it. The title of the page is set, the author and version placed in comments, and an image map applied to the tabs. The correct tab is selected for different kinds of pages (filename provided as an attribute of webpage) and the background colour of the actual content is set accordingly (colour provided as an attribute of webpage). That's the only remaining evil: The tab and colour have to be defined in all the webpages.

The actual source code...

I edit all my files by hand (with an editor, that is). Not everybody will prefer this, and those people I would suggest to go to IBM to pick up their development tools for XML and XSL, including graphical editors.

As always when I set up complicated batch processes like the one generating these pages, I have written a Makefile which I run in my public_html/src directory and which fills my public_html directory. The Makefile invokes the lotusXSL command, which is an XSLT processor, expecting three arguments, namely a source XML document, the XSLT stylesheet and the desired output document. This is a front-end that Roelof van Zwol built around the lotusXSL processor by Lotus.