Lecture Notes

XPATH Intro

Syntax to describe parts of an XML object, a way to refer to a particular (element or attribute) or to all (elements of a paricular type) or to (any other view of an XML object). Remember, I said we would spend most of our time doing two things:

  1. describing how to find things in an XML object (selecting)
  2. and moving through that XML object in an orderly fashion (traversing)

XPATH is about how to describe all or part of the path to a thing or set of things within an XML object. Note that I am deliberately referring to the target for XPATH operations as an XML object to remind you that the input need not be an actual document.

XPATH Discussion Points:
  1. XPATH works with the parsed version, not the input version
  2. XPATH views an XML object as a tree of nodes with one root
  3. the tree XPATH works with is basically just like the idea of the DOM
  4. differences between SAX and DOM parsers

Seven Kinds of Nodes in XPATH view of XML

  1. root node

    The root node in XPATH is the one pointing to the root of the entire XML object. In the example of our results.xml document, the root node in XPATH is the XPATH node that contains the entire document. It contains the results element (the root of the XML document itself), it is not the results element itself. The distinction is important.

    You specify the root node in XPATH using a single-slash (the / character)

    Alone amongst the nodes the XPATH root node has no parent, it always has at least one child node (the document element), and it contains comments and processing instructions which are outside the document element (for example, a processing instruction named xml-stylesheet or a comment before the root node of the XML object). As you can imagine, the string value of the root node as returned by <xsl:value-of select="/" /> is thus "the concatenation of all the text nodes of the root node's descendants" (quote from Doug Tidwell's book XSLT).

    Take a look at the results.xml document as it is perceived by XPATH.

  2. element nodes

    Every element found in an input XML object has a corresponding element node in XPATH. the children of an element node include:

    that appear within the root node in the input XML object. The string value of an element node as returned by <xsl:value-of select="/results/match" /> is the concatenation of the text of this node and all of its children in the order in which they appear in the input XML object.

    The XPATH name() function can be used to obtain the name of an element which has been located, along with the namespace that is in effect. For example, with no namespace declared the name() of the <date> element is "date". Take a look a tutorial or your book for more information on local-name() and namespace-uri() and related XPATH functions.

  3. attributes of element nodes

    Any attribute node in XPATH has a parent element node, but theattribute nodes are not technically the children of their parent element. The chidren of their parent element are the text, element, comment, and processing instruction nodes contained in the element in the original input XML object. Attributes get treated a bit differently than elements.

    Remember that an attribute may not appear in the original XML input document. For example, an attribute may be defined in a Schema as having a default value which may or may not be explicitly set in the input XML object itself. XPATH will normally create an attribute node for anything with default values, whether or not they are explicitly stated in the input XML object. If there is no external DTD or Schema referred to by the input XML object, then defaults will not be noticed and will not be present in the XPATH tree.

  4. text nodes

    This just contains any text contained by an element. Entity and character references get resolved before processing by XPATH. CDATA always appears as a text node.

  5. comment nodes

    Comment nodes also just contain the contents of comments found in the input XML object, with the leading <!-- and trailing --> removed from each comment.

  6. processing instruction (PI) nodes

    Processing instructions have two parts, a name and a string value. Use the name() function to access the name. The string value is everything after the name and before the closing ?>

  7. Namespace nodes

    You almost never use these in XSLT. Just remember that the namespace declaration becomes a namespace node, not an attribute node. Yes, I know it is an attribute in the XML source, but it is not treated as one by XPATH.

Location Paths

Selecting and traversing, those are our most fundemental operations using XSLT. A location path describes the path to locate something in an XML input object. The context of a particular location path is usually thought of as the node in the tree from which a particular expression is evaluated. There are really five parts to a context:

  1. context (or "current") node
  2. an integer representing the context position (the index into the "array"-like data structure for the item currently being processed from the selected set), and a second integer representing the context size (number of items selected by that expression)
  3. the set of variables (name/value pairs) now in scope
  4. the set of functions available to XPATH expressions
  5. the set of all the namespace declarations now in scope
Some Simple Location Paths for Elements

Note the similarity to directory-system paths. Note use of "." to refer to the current node and ".." to refer to the parent node.

Finding Things Other Than Elements

Abbreviated vs Unabbreviated Syntax

So far, we have been using the abbreviated syntax to express things in XPATH. Some of the lesser-known axes (things like "get all ancestors of the corrent context", "get all descendants of the current context", "get all siblings of the same context node", and so on) can only be specified with the unabbreviated syntax.

As an example, we could use <xsl:apply-tempates select="match"/> to select all of the match elements in the current context, or we could use <xsl:apply-tempates select="child::match"/> to select all of the match elements which are children of the current context.

  1. XPATH axis overview
  2. Axis details

Useful Resources

In-Class Exercise Materials

Readings

  1. Overview of XPATH
  2. Tutorial for XPATH
  3. XSLT & XPath Tutorial

revalidate XHTML Revalidate CSS Section 508 testing unique visitor counter

Last modified: 10 Mar 2008 10:40:04 AM