I have left Harvard as of July 1, 2008 to take a position at NYU. This website has been cached and left static. Feel free to browse my new website, aka "What the heck is a Clinical Associate Professor?"

12.05.05

Review of “No Nonsense XML Web Development with PHP” by Tom Myer

Posted in Books, PHP at 7:36 am by leingang

XML is, depending on who you ask, “One file format to rule them all”
or “most hyped technology” ever. One way to describe it is a
self-documenting file format with a simple syntax inspired by HTML.
But what is it good for?

What isn’t it good for? Anything can be stored in an XML format,
whether it’s web site content, data from an experiment, server logs,
whatever. Word processor and spreadsheet files saved by OpenOffice
are gzipped XML documents. XML can even be used to prescribe how one
XML document should be transformed into another, giving a nice
cross-platform method for (say) converting an OpenOffice file to HTML.
This is XSLT. With a few shell scripts and utilities, an entire web
site can be managed, publishing content in different formats, without
a full-featured scripting language.

But the base XSLT can’t do everything (for instance, insert the
current date), and so it’s nice to have the support of a scripting
language like PHP. PHP has its fans and haters, but its popularity
cannot be denied. It has a simple syntax and data model (in
particular, hashes and lists can be treated pretty equally), it’s easy
to learn, and it has tons of libraries built in, or able to be added.

Combining the strength of XML with the simplicity of PHP, here comes
the aptly-titled No Nonsense XML Web Development
With PHP
. The book is by Thomas Myer and is published by the good
folks at SitePoint (The book is generally not available in stores,
but can be purchased through their website). The author aims to
introduce the power and versatility of XML to the beginning web
professional. It’s a fun read, and although the average übergeek
may not learn much from it, it’s certainly worth recommending to a
newbie.

The book is for those who are comfortable with HTML and have had some
experience with PHP and JavaScript, but who missed the XML boat
altogether and are wondering what it’s all about. The author teaches
a course entitled “XML for Mere Mortals,” and the book and course seem
to have the same audience. So it’s not an advanced programming book, but
it does deliver on the promises it makes.

Summary of Book Contents

Chapter 1 introduces XML, what it looks like, and how to examine it.
(IE and Firefox will both check an XML file for well-formedness and
display it). If you don’t know what well-formed XML is, the chapter
explains that too.

The chapter also sets up the single major example that is visited
throughout the book: and XML-based content management system (CMS).
You may have thought that all CMSs had to have a relational dababase
(or maybe only MySQL!) as a backend, but XML can provide this storage
mechanism easily, too. One of the strongest features of the book is
this extended example: each new XML lesson helps build another
part of the CMS. It’s nice to have it all tie together like that.

Chapter 2 talks about the family of XML technologies–XSLT, XPath,
DTD/XML Schema, and Namespaces. Perhaps the biggest XML application
out there is XHTML, which is simply HTML written with XML strictness.
If you are converting XML to XHTML with a single stylesheet, you don’t
need anything but IE or Firefox to do it. For more complicated
transformations (i.e., XML to XML), or to support all browsers, a
server-side solution is more often needed.

Chapter 3 explains Document Type Definitions (DTDs), which are a
format to describe how your XML files should be structured.
Organizations exchanging files have a way of declaring that the data
they send conforms to a certain agreed-upon standard. This may not
seem necessary if your application uses an in-house structure, but it
does prevent bugs introduced by invalid XML.

XML whizzes know of the other way of prescribing XML document formats,
called XML Schema. Advantages of XML Schema include being able to
have elements validate differently in different contexts, and the fact
that XML Schema files are themselves valid XML documents. A major
disadvantage of XML Schema, the cause of its steep learning curve, is
its complexity, which pushes the envelope of human-readability. The
author pays lip service to XML Schema and uses DTDs throughout the
book, a choice I would have to agree with for the intended audience.

Chapter 4 does a little bit more with XSLT to display XML files in web
browsers, and also starts diving into the PHP SimpleXML API (not to be
confused with SAX–the Simple API for XML!). This is brand new to
PHP5 and allows XML documents to be treated in a number of different
ways. An XML file loaded into memory becomes a PHP object, and
child elements become child objects. Thus an XML document that looks
like

<person id="47">
  <name>Tom</name>
  <age>33</age>
</person>

can be accessed with SimpleXML like this:

<?php
  $doc = simplexml_load_file("foo.xml");
  print $doc->name . " is " . $doc->age . " years old.";
?>

Attribute nodes connected to an element node can be selected using
PHP’s hash syntax, so Tom’s ID can be accessed by $person[’id’]. If a
node has multiple child nodes of the same element type, they can be
distinguished using PHP’s list syntax. If the XML document above were
actually a fragment contained in a larger document consisting of
people, and Tom was the 47th, his age can be found using
$doc->people->person[47]->age.

Again, aficionados know that there’s another well-thought-out solution
to selecting parts of XML documents, called XPath. Xpath
support is built into SimpleXML, too, so the previous example can be
also done with $doc->xpath(’/people/person[47]/age’).

These two methods make choosing parts of an XML document quite easy.
A small issue comes in the datatype returned. $doc->person->age
returns a PHP object, which is transparently cast to a string (the
character data contained therein) when evaluated in a string context
such as print. But when testing that character data, it may
sometimes be necessary to explicitly cast a SimpleXML object to a
string.

Chapters 5, 6, and 7 are about various methods for manipulating XML
documents. Chapter 5 shows more of programmatic aspects of XSLT such
as looping, sorting, counting, and branching.

Chapter 6 is about using JavaScript to manipulate XML documents. With
JavaScript, pages can be built or changed on the client side, and
therefore very quickly. So GUI niceties like dynamic menus are possible.

In many programming languages, XML documents are accessible by an
interface standard called the Document Object Model (DOM). The
DOM is implemented in most JavaScripts but not to the same level in all
browsers. This means that there is a lot of work needed to maintain
cross-browser compatibility in a page that uses JavaScript XML
methods. A lot of the work for this has been done by the developers
of the Sarissa JavaScript library, which tries to unify the interface
for different browsers’ methods. The tradeoff, of course, is that you
can only use the methods common to all browsers.

Chapter 7 demonstrates the various ways one can use PHP to process XML
documents. The SAX interface is outlined in detail here. With SAX,
an XML document is a stream of “events”–starting the A element,
starting the B element, the character data “foo”, ending the B
element, ending the A element, … Handler functions are registered
for each event, indicating what’s supposed to happen at each posdition
in the stream. This is useful when (for instance) taking a bit of XML
that’s list-ish and presenting it as an HTML list.

For more complicated processing, a “big picture” model of an XML
document is needed. SimpleXML is very useful here, but it’s not
possible to create new elements in an XML document with this
interface. So transforming XML to XML means looping and echo-ing a
lot of angle brackets. For a cleaner model of XML that allows
a document to be built from elements, you need the sledgehammer XML
interface,the DOM. and PHP implements that, too, and the author spends
some time on it.

Chapter 8 covers another hot XML application: RSS. To say it without
getting into the name and specification wars, RSS allows web sites to
present machine-readable summaries of their content, especially the
recently added content. Applications can collect (aggregate), read,
and process RSS feeds to provide users access to a large number of
articles quickly. This technology allowed the blogosphere to become
so well interconnected and popular. Since RSS is an XML application
(some versions of RSS are RDF applications, too), content stored as
XML can be summarized rather easily into RSS. The author demonstrates
this for his big CMS example.

Chapter 9 is about Web services, also a well-hyped web technology.
Web services take the functionality of web sites and remove it from
the human interface. An application can consume a web service to
retrieve information or perform a function on another computer. Not
surprisingly, the major web service protocols (XML-RPC and SOAP) are
XML applications. The author shows how to make a very simple XML-RPC
server.

Chapter 10 is a grab bag of methods for connecting XML to relational
databases, which in the scope of this book means MySQL. The author
demonstrates the use of mysqldump to back up relational data to
XML (and the similar feature of the popular phpMyAdmin web
application), and how to populate a database with XML data.

There are two appendices, one of which is very useful: the rest of the
CMS example code explained.

Commentary

The book has three major strengths: its simplicity, its brevity, and
the informal tone of its writing. The author stated his audience and
his goals for the book, and he has achieved the goal of getting the
slightly-experienced web developer up to a point of understanding how
XML can be used to manage data in a few short lessons. The fact that
there’s one example throughout and each topic in the book helps build
it keeps the book clean and simple.

However, each of these could be seen as a slight weakness. There’s no
sophisticated object-oriented programming here. Nor does the author
talk about multitiered architecture. Somebody completing this book
may still not know the whole story about writing a good CMS application
with XML.

The book is almost too short. The author admits he hit a wall around
Chapter 8, and it’s evident. The last few chapters don’t connect as
well to his extended example (Why would I want to back up a
filesystem-based CMS to a relational database? Can’t I just backup
the directory?). Documentation for the complete set of PHP XML
functions forms another appendix, but this is information that can
more easily be accessed online and doesn’t need to be in books
anymore. I would have liked one or two more knock-your-socks-off
examples of the power of XML. Cocoon uses XML to not only store
content but describe its pipeline from data to web page (or WML deck,
or SVG document…).

Finally, the author’s style is sometimes so colloquial as to be
distracting. Some of his paragraphs read as scripts for his class,
which is fine when preparing a lesson but doesn’t fly so well in
prose. Perhaps this is nitpicking, but there’s no reason why
technical books shouldn’t be well written and edited.

All in all, though, I still feel the book would be a good starter for
the complete XML beginner. That may not be you, but it may describe a
friend who comes to you for answers or advice. Keep this title in
mind when they do.


By day, Matthew Leingang is a preceptor in mathematics at Harvard University. He has been known to procrastinate by reviewing and consulting on technical (and math) books.

Comments are closed.