CC292 Labs: XML

Introduction

The aim of this lab is to provide a brief walkthrough guide to authoring XML and DTD documents, and to manipulate these documents using Java with JDOM, and using E4X (JavaScript with XML extensions).  We'll also see how JSP can be used to produce XML instead of HTML.  The lab assumes that you are already familiar with the CC292 lecture notes on XML.

Editing XML and DTDs

For this part use a validating editor such as Intellij Idea.  This actually checks that the XML is well formed, and will validate it, so long as you provide access to a DTD.  Note that XML Schemas can also be used for validation; these are more powerful than DTDs in terms of the constraints that they can enforce, but they are also very much more complex and verbose, and not covered on this course.

Within an existing Intellij project, or a new one, create an xml directory, and within that (from the Intellij menu) create two new files, AddressBook.xml, and AddressBook.dtd.

For each file type or copy and paste the following text:

AddressBook.xml

<!DOCTYPE AddressBook SYSTEM "AddressBook.dtd">
<AddressBook>
<Title>Simon's address book</Title>
<Person name="Simon"
email="sml@essex.ac.uk"/>
<Person name="Anna"/>
</AddressBook>

and:

AddressBook.dtd
<!-- DTD for simple address book -->
<!ELEMENT AddressBook (Title, Person*)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Person EMPTY>
<!ATTLIST Person name CDATA #REQUIRED>
<!ATTLIST Person email CDATA #IMPLIED>

Now experiment with varying details of the XML document, and of the DTD.  Note which changes are acceptable, and which are not.

In particular, try the following (after each change restore the documents to their original state before proceeding)

  1. Omit the name from one of the Person elements in the XML document.
  2. Repeat the title element in the XML document.
  3. Change email to be #REQUIRED in the DTD.
  4. Remove all the Person elements from the XML.  With this in place, change the repetition operator in the DTD from '*' to '+'.
  5. Make the XML document ill-formed in some way.

Observe how Intellij attempts to validate the document after each change, and offers sensible warning messages when the XML becomes invalid with respect to the DTD.

You may also wish to attempt the final exercise from the lecture notes.  Note that in my solution I used attributes of type ID and IDREF to enforce a limited type of referential integrity.

Completed solutions for this are available: products.xml ; Cat.dtd.  With these, experiment with duplicating or missing out an ID.  Are IDs enforced to be unique throughout the document, or just unique within the set of attribute values for a particular element?

JDOM

In this section we'll work through re-creating the address book example from above, writing it out to a file, reading it in, and performing some simple processing on it.  First of all we create the XML structure: and write it to standard output.  To save space the import declarations have been omitted, but Intellij will prompt you for these (just hit <alt><enter> when requested to).

To run these examples (and for Intellij to prompt for the correct import statements) you'll need to copy the jdom jar to a suitable place, and add it to your Java classpath (within Intellij, simply add it as a module dependency).

WriteXML.java
public class WriteXML {
  public static void main(String[] args) throws Exception {
    Element root = new Element("AddressBook");
    Element title = new Element("Title");
    title.setText("Simon's address book");
    Element e1 = new Element("Person");
    Element e2 = new Element("Person");
    e1.setAttribute("name", "Simon");
    e1.setAttribute("email", "sml@essex.ac.uk");
    e2.setAttribute("name", "Anna");
    root.addContent(title);
    root.addContent(e1);
    root.addContent(e2);
    XMLOutputter out =
    new XMLOutputter(Format.getPrettyFormat());
    out.output(root, System.out);
  }
}

Mini Exercises:

  1. Experiment with using XML special characters within the setText method, and observe the output.  E.g. title.setText("Simon's <i> address</i> book").  Why does the the text content look different to the string that was set?
  2. Make a small modification to this code to write it to a file instead of the standard output.

Now experiment with reading it back in using this program:

ReadXML.java
public class ReadXML {
  public static void main(String[] args) throws Exception {
    String infile = args[0];
    SAXBuilder builder = new SAXBuilder();
    InputStream is = new FileInputStream(infile);
    Document doc = builder.build(is);
    Element root = doc.getRootElement();
    // now print the names and emails in plain text
    for (Element el :
      (List<Element>) root.getChildren()) {
      System.out.println(el.getAttribute("name"));
    }
  }
}

This illustrates how to read in an XML documentm and then do some simple navigation (getChildren) and iteration.

E4X

We now revisit the temperature conversion table, and see how it can be done using E4X.  The code is below: copy and paste it into an HTML file.  Note the use of the curly brace notation to evaluate expressions within the XML tags.

Unfortunately, Intellij does not yet understand E4X code, and you'll see a lot of red in the editor window.

E4XTable.java
<html>
<head>
<title>E4X Test</title>

<script type="text/javascript">
  function myFunc() {
    var xx = <MyRoot name="simon">Text Content</MyRoot>;
    var el = document.getElementById('test');
    var table = <table><tr>
      <th>Celsius</th>
      <th>Fahrenheit</th></tr></table>;
    var from = 10;
    var to = 15;
    for (var i=from; i<=to; i++) {
      table.tr += <tr> <td> {i} </td>
        <td> { i * 9.0/5 + 32 } </td> </tr> ;
    }
    el.innerHTML = table;
  }
</script>
</head>

<body style="font-size: 2em" onload="myFunc()">

<h3>E4X Table</h3>

<p></p>
<div id="test"></div>
</body>
</html>

 

Work through the E4X examples in the XML lecture notes.  You'll need a container program to run them within, which must include a print method.  A suitable one is given here: RunE4X.html.

The main part of this is shown below.  The print function simply concatenates the output to the text content of the output node.

This can be seen in operation with one of the examples from the lecture notes:

Using this setup you can easily experiment with various E4X examples.

JSP and XML

So far we've used JSP as a relatively painless way of producing dynamically generated HTML.  It can also be used for XML generation, and works just as well for this purpose.

To do this we'll once again work through the dynamic creation of a temperature conversion table.  This time, however, we're creating XML markup instead of HTML tags.

For the XML document structure, we'll use a root element called <conversions>, then each temperature element will contain an attribute for the Celcius value and an attribute for the Fahrenheit value (we'll abbreviate these to 'c' and 'f').

XML documents may optionally contain a version declaration at the start of the document, and this helps browsers to handle the information correctly.  Note, however, that this must be the very first text within the document if it is present - hence I've included this declaration as the very first line of the JSP page.

Enter this and verify that it works correctly.  Firefox and Internet Explorer both offer convenient collapsible display of XML documents - try it.

Which in Firefox produces an output like this:

While the range of temperature values was hard-wired into the code in this example, they could of course have been obtained from a JavaBean.  When generating XML using JSP you can use all the same tricks as when generating HTML (e.g. use Java helper classes, write directly to the output stream, manually process HTTP request parameters).

Summary

XML is a vitally important technology.  It is the glue that binds many different systems together, and is increasingly used as a file format for all kinds of applications.

In this lab we covered some alternative methods for generating and processing XML.  JDOM provides a convenient way to process XML directly from Java programs, but is a bit verbose: each element and attribute has to be explicitly created (though of course it is possible to create XML in a String, and then parse it using JDOM).  On the other hand, E4X and JSP offer ways to mix XML tags directly with program code.  JSP is limited to running on the server side, while E4X can be run on the client (assuming a suitably enabled web browser such as Firefox), or on the server using Rhino.

 

 

end of page