Article : Using XPath with PHP

Using XPath with PHP


If your PHP applications perform beyond CRUD operations, chances are you have worked with XML. Navigating an XML document or data stream for the desired data elements can be cumbersome, though, and even somewhat intimidating for PHP developers. It can be especially overwhelming when the XML data structure is complex. XPath is a W3C standard whose sole purpose is just thatgetting to the right data element, or, specifically, the desired node. PHP supports XPath as part of its XML classes and functions. In this article you explore some basic scenarios for locating information in XML and how XPath can do the hard work for you in your PHP applications.

In this article, learn about these concepts:

  Uses for XPath
  Writing basic XPath expressions
  Using the PHP XML libraries' XPath functionality
  Using XPath and PHP to transform data formats

This article has several working examples using XPath with PHP that you can practice, assuming that you have the prerequisite skills described in Prerequisites.
Prerequisites

To get the most from this article, you should have knowledge of XML and PHP5 along with configuring and installing PHP extensions. In addition, you need access to and a working knowledge of a UNIX®-based or Microsoft® Windows® operating system with a web server that supports PHP5 on which you can practice the code examples covered in this article.

Understanding the need for XPath

As the web moves more toward the original vision of the semantic web, applications become more interactive with one another. Technologies such as SOAP, REST, RSS, RDF, and others are powerful enablers to the future web. For the most part, XML is the chosen message format to describe the data involved. To some degree you can use JSON, but you will probably see XML as the predominant method for data exchange.

Or, you might encounter the task of transforming XML data into XHTML to deliver an attractive, interactive, and usable interface for mobile users and various web browsers.

Storing data as XML documents to a file system or an XML-compliant database is also a popular way to archive data for later retrieval for tasks such as reporting, user interface display, or working with application integration.

When working with XML data, you need to parse the data to get to the least common denominator—the desired data (or atomic node value, as it is often called). XSLT is a W3C standard that performs the transformation of XML into another format such as HTML, PDF, or even another XML document that uses a different schema. It is heavily reliant on XPath usage, as are XQuery, XForms, and XPointer.

XPath background

In its simplest form, XPath is a language for navigating an XML tree in memory. Originally, XPath was designed as a language for XSLT and XPointer. XPath 1.0 became a W3C standard in 1999. The more current XPath 2.0 obtained specification status in 2007.

As other specifications for XML have emerged, so has the use of XPath. Today, XPath is the language of choice for navigating XML in XML schemas, ISO Schematron, XQuery, and XForms. Ironically, but for good reason, XPath is not based on the XML syntax. It uses its own syntax to avoid collisions with URIs and other XML-related syntax.

As you work with XPath in PHP, just remember that you don't use XPath alone. You use it as a tool to navigate the XML in memory while working with one of the other XML specifications.

For the purpose of this article, I use the more widely used XPath 1.0 specification in the discussion and examples. XPath 2.0 is backward compatible with XPath 1.0, but PHP has limited support today for XPath 2.0.

The XPath specification provides a detailed description for its standard use and terminology. If you anticipate complex use of XPath with PHP, this specification is a reliable point of reference. Otherwise, the XPath specification can be summarized into four main areas of interest, as described in Table 1.
Table 1. Four major areas of the XPath specification
Area    Description    Examples
Location paths    Contains location steps, axes, predicates, and abbreviated syntax.    parent::node, child::text(), attribute::*, /PRODUCTS/PRODUCT[3]/NAME
Data model    Describes the XML as a tree. Contains nodes for root, attributes, text, namespaces, elements, processing instructions, and comments.    /, /ns1:PRODUCTS/ns1:PRODUCT, @category
Expressions    Might contain variable references, functions, Booleans, numbers, strings, and predicates.    /PRODUCTS/PRODUCT/NAME[string-length( ) > 15]/../@category
Functions    XPath 1.0 has 27 built-in functions that are categorized as node set, string, Boolean, or number functions.    string-length(), true(), sum()

Back to top
Writing XPath expressions

Before writing PHP code, take time to review XPath nodes, paths, and functions. The products.xml file in Listing 1 provides examples of a few commonly used expressions to locate data in an XML path based on predicates, locate the atomic values of nodes, and make use of functions.
Listing 1. A sample products XML document (products.xml)

<?xml version="1.0" encoding="UTF-8"?>
<PRODUCTS>
<PRODUCT category="software">
<SKU>soft32323</SKU>
<SUB_CATEGORY>Business Analysis</SUB_CATEGORY>
<NAME>Widget Reporting</NAME>
<PRICE>4500</PRICE>
</PRODUCT>
<PRODUCT category="software">
<SKU>soft32323</SKU>
<SUB_CATEGORY>Business Analysis</SUB_CATEGORY>
<NAME>Pro Reporting</NAME>
<PRICE>2300</PRICE>
</PRODUCT>
<PRODUCT category="storage">
<SKU>soft32323</SKU>
<SUB_CATEGORY>Tape Systems</SUB_CATEGORY>
<NAME>Tapes Abound</NAME>
<PRICE>2300</PRICE>
</PRODUCT>
<PRODUCT category="storage">
<SKU>soft32323</SKU>
<SUB_CATEGORY>Disk Systems</SUB_CATEGORY>
<NAME>Widget100 Series</NAME>
<PRICE>6500</PRICE>
</PRODUCT>
</PRODUCTS>

/PRODUCTS returns all nodes that are children of the PRODUCTS node, of which there are four. Take notice of the forward slash (/) symbol. If you are familiar with UNIX-based operating systems, you know that the forward slash represents the absolute path. As in UNIX file paths, you can use an absolute path when in doubt of your current context location. For the document in Listing 1, PRODUCTS is the root node.

Relative paths work with XPath, too. When you use .. in an expression, it instructs the expression to work up one level from the current node in the hierarchy (again, similar to working with directories in UNIX operating systems). For example, ../PRODUCT/SKU returns the atomic node value for all four SKU numbers associated with each PRODUCT node.

Selecting an attribute node requires special syntax in XPath. If you want to return all PRODUCT nodes that are listed in the category of software, the expression /PRODUCTS/PRODUCT[@category='software'] does just that. In this expression, category is considered an attribute in the XML document. XPath can select attributes with the at symbol (@). Alternatively, you can select an attribute using the attribute:: syntax such that the expression is /PRODUCTS/PRODUCT[attribute::category='software']. Most find the at symbol to be less verbose and easier to use.

To select the atomic node values (that is, the actual text value) of all product names for products costing more than $2,500, you can write the expression as /PRODUCTS/PRODUCT[PRICE > 2500]/NAME. When executed, this expression returns the product names Widget Reporting and Widget100 Series.

Take a look at one final expression example before going into using XPath with PHP: /PRODUCTS/PRODUCT/NAME[string-length( ) > 15]/../@category. When executed, this expression returns two values: software and storage. Specifically, this expression matches the category value for every product that has a name over 15 characters in length.

The XML document in Listing 1 is the source document for the XPath expressions that are demonstrated in this article.

Discovering XPath support in PHP

That PHP supports XML and XPath should come as no surprise. In fact, the most popular web scripting language offers some good functions for working with XPath in its core libraries.

In the core PHP libraries, you have a few choices when working with XML:

  SimpleXML
  DOM
  XMLWriter/Reader

SimpleXML is easy to use and can be suitable for XML-related chores that are relatively simple. It does have some limitations; for example, it doesn't fully support validation, writing, and namespaces. If you are processing a large XML data tree, keep in mind that SimpleXML loads the full XML document tree into memory before processing.

If you need to perform more complex XPath expressions and need full control of the document, DOM is an option. DOM is short for Document Object Model, which is a W3C standard. You can enable DOM with PHP by installing it as an extension. Extensions such as DOM are typically painless to install and enable. Many times it is just a case of uncommenting a single line in your php.ini file to enable the already compiled module. Like SimpleXML, DOM loads the full XML document tree into memory before processing. As you will see later in the article, DOMXPath is quite pleasant to use as well.

You can also download and install XML_XPath from the PHP Extension and Application Repository (PEAR) repository. This class uses DOM and provides a way to query using XPath for document manipulation and extracting atomic node values.

If you work with the Zend Framework, a Zend_Dom_Query library is available. If your particular PHP framework doesn't provide special classes or functions for XML and XPath, you can simply use what PHP already provides.

XMLWriter/Reader doesn't directly support XPath without help from SimpleXML or DOMXPath, so it isn't covered any further in this article.

No matter which library or framework you use, understanding how to create XPath expressions is essential to gain the full potential. When using XPath with PHP XML-related libraries, the XPath syntax is the same. The following examples and demonstrations use a combination of DOM and SimpleXML.

Working with XPath in PHP

With the sample products.xml file from Listing 1 and the PHP5 SimpleXML API, you can experiment with various XPath expressions.

When you execute the code in Listing 2, the result is a dump of the complete XML file, in an array, of course. The XPath expression /PRODUCTS results in a match of every node that is a child of the root node (PRODUCTS).
Listing 2. Using SimpleXML to display all nodes in an array

<?php
$xml = simplexml_load_file("products.xml");
<strong>$products = $xml->xpath("/PRODUCTS");</strong>
print_r($products);
 ?>
------------------------------------------------------------
OUTPUT:

Array ( [0] => SimpleXMLElement Object ( [PRODUCT] =>
Array ( [0] => SimpleXMLElement Object
( [@attributes] => Array ( [category] => software ) [SKU] =>
soft1234 [SUB_CATEGORY] =>
Business Analysis [NAME] => Widget Reporting [PRICE] => 4500 ) [1] =>
SimpleXMLElement Object
( [@attributes] => Array ( [category] => software ) [SKU] => soft5678
[SUB_CATEGORY] =>
Business Analysis [NAME] => Pro Reporting [PRICE] => 2300 ) [2] =>
SimpleXMLElement Object
( [@attributes] => Array ( [category] => storage ) [SKU] =>
stor01010 [SUB_CATEGORY] => Tape Systems [NAME] =>
Tapes Abound [PRICE] => 1900 )
[3] => SimpleXMLElement Object ( [@attributes] =>
Array ( [category] => storage ) [SKU] =>
stor23232 [SUB_CATEGORY] => Disk Systems [NAME] =>
Widget100 Series [PRICE] => 6500 ) ) ) )

When you execute the code in Listing 3, the result is an array output of the value of each NAME node of the XML tree. Notice how the expression /PRODUCTS/PRODUCT/NAME locates every node it matches in the XML tree as opposed to only the first or last one.
Listing 3. Using SimpleXML to display all product names in an array

<?php
$xml = simplexml_load_file("products.xml");
<strong>$products = $xml->xpath("/PRODUCTS/PRODUCT/NAME");</strong>
print_r($products);
 ?>
------------------------------------------------------------
OUTPUT:

Array ( [0] => SimpleXMLElement Object ( [0] => Widget Reporting )
[1] => SimpleXMLElement Object ( [0] => Pro Reporting ) [2] =>
SimpleXMLElement Object ( [0] => Tapes Abound ) [3] =>
SimpleXMLElement Object ( [0] => Widget100 Series ) )

If you need the value of a particular node or nodes based on some criteria, follow the examples in Listings 4 and 5.

To locate an atomic node value is to extract desired values from the XML document. When you execute the code in Listing 4, the result is the atomic value of one node.
Listing 4. Using SimpleXML to display a product's name for a particular SKU

<?php
$xml = simplexml_load_file("products.xml");
<strong>$products = $xml->xpath("/PRODUCTS/PRODUCT[SKU='soft5678']/NAME");</strong>
print_r($products);
 ?>
------------------------------------------------------------
OUTPUT:

Array ( [0] => SimpleXMLElement Object ( [0] => Pro Reporting ) )

The XPath expression /PRODUCTS/PRODUCT[SKU='soft5678']/NAME specifies all nodes that match the expression. In this case, only one product has the SKU number match. If you need to locate a node's value as it relates to its position in the XML tree, you can use the position() function.

You can use conditional expressions in XPath to further pinpoint the location of specific nodes. Listing 5 shows an example of this using SimpleXML and XPath with a conditional expression.
Listing 5. Using SimpleXML to locate products based upon a conditional

<?php
$xml = simplexml_load_file("products.xml");
<strong>$products = $xml->xpath("/PRODUCTS/PRODUCT[@category='software' and PRICE > 2500]"); </strong>
print_r($products);
 ?>
------------------------------------------------------------
OUTPUT:

Array ( [0] => SimpleXMLElement Object ( [@attributes] =>
Array ( [category] => software )
[SKU] => soft1234 [SUB_CATEGORY] => Business Analysis [NAME] =>
 Widget Reporting [PRICE] => 4500 ) )

You might have noticed that Listings 2, 3, 4, and 5 have the exact same PHP code—the only differences are the XPath expressions. When you master the steps using SimpleXML, you have the full power of the XPath language available to you. The steps taken with the PHP code when using SimpleXML are summarized as follows:

  Load the XML file into memory.
  Write and execute the XPath expression using the Object->xpath class.
  Manipulate the matched nodes and values using your PHP skills.

The output result in each listing is the print_r($products); statement. It dumps the value for display as an array. In reality, you most likely will take the result and perform some operation on it using PHP.

Listing 6 uses DOM and DOMXPath to work with XML and XPath.
Listing 6. Using DOMXPath to display a product's name for a particular SKU

<?php

$doc = new DOMDocument;
$doc->load('products.xml');
$xpath = new DOMXPath($doc);
<strong>$products = $xpath->query("/PRODUCTS/PRODUCT[SKU='soft5678']/NAME");</strong>
 
foreach ($products as $product)
{
  print($product->nodeValue);
}
 ?>
------------------------------------------------------------
OUTPUT:

Pro Reporting

The sequence of PHP code that you use for DOM and DOMXPath isn't much more complex than the SimpleXML steps. Those steps in Listing 6 are summarized in the following sequence:

  Load the XML file into memory from the DOM object.
  Create an XPath object from the loaded document.
  Query the XML tree and return a products object.

Again, when you are comfortable with that snippet of PHP code using DOM, you have XPath available to do the grunt work.

Using XPath for transformations

In reality, you can transform XML into XHTML without using XSLT. You could expand upon and become more creative with the previous examples using only SimpleXML or DOM and create your XHTML files for display. And you may well be more comfortable with that approach. However, considering one large use of XPath is for XSLT transformations, and PHP does support XSLT, it is appropriate to demonstrate. Besides, using XSLT can save you a lot of time and frustration!
PHP XSLT transformations

To stay focused on PHP and XPath, the transformation example in this section doesn't use any CSS or valid URL links. When transforming data using XSLT, keep in mind that you can include styles, JavaScript, or anything else that your typical HTML pages need. In addition, your XSL file will most likely be carefully organized in your application structure in a way that fits your framework.

A solid understanding of XPath is essential when working to transform XML data into other formats such as HTML with XSLT.

RSS and Atom feeds are XML-based so XPath is the ideal tool to transverse the feed and select desired data. Assuming that this article has its own Atom feed for various PHP and XPath techniques, you can use one of the PHP XML libraries to extract entries from the feed and display them as desired on your website.

An XSLT file as simple as the one in Listing 7 depends heavily on XPath.
Listing 7. A simple XSLT file that transforms a feed into HTML (article_feed.xsl)

<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
  xmlns:atom="http://www.w3.org/2005/Atom"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:dc="http://purl.org/dc/elements/1.1/">

  <xsl:template match="/">

  <html>
  <head><title><xsl:value-of select="//atom:title"/></title></head>
  <table>
  <tr><td><xsl:value-of select="//atom:title"/></td></tr>
  <tr><td><i>"<xsl:value-of select="//atom:subtitle" />",</i></td></tr>
  <tr><td>by <xsl:value-of select="//atom:author"/></td></tr>
  <xsl:for-each select="//atom:feed/entry">
  <table border="1" >
  <tr>
  <td>Title</td><td><xsl:value-of select="//atom:title"/></td>
  </tr>
  <tr>
  <td>Summary</td><td><xsl:value-of select="//atom:summary"/></td>
  </tr>
  <tr>
  </tr>
  </table><br/>
  </xsl:for-each>
  </table>
  </html>
  </xsl:template>
</xsl:stylesheet>

The two forward slashes (//) instruct XPath to match the first named node starting at the root node. With only one node each for title, subtitle, and author, this is a shortcut to express the location path without entering the absolute path. Because the for-each loop works in the context of the //feed/entry node, the relative paths for title and summary are used.

Using the XSLT file from Listing 7, you can now write the appropriate PHP code to perform the transformation, as in Listing 8.
Listing 8. Using DOM for XLST transformation

<?php
$doc = new DOMDocument();
$xmlStream = <<<MyFeed
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
<title>Using XPath with PHP</title>
<author><name>Tracy Bost</name></author>
<subtitle type="html">
Let XPath do the hard work for you when working with XML</subtitle>
<link rel="self" type="text/html"
hreflang="en" href="http://www.ibm.com/developerworks/"/>
<updated>15 Aug 2011 22:51:48 +0000</updated>
<entry>
<title>SimpleXML & XPath </title>
<summary>If you are using SimpleXML to parse XML or
 RSS feeds, XPath is great to use!</summary>
<link rel="self" type="text/html" hreflang="en" href=""/>
<published>21 Apr 2011 04:00:00 +0000</published>
<updated>21 Apr 2011 04:00:00 +0000</updated>
</entry>
<entry>
<title>DOMXPath</title>
<summary>If you are using DOM for traversal XML documents,
give DOMXPath a try! </summary>
<link rel="self" type="text/html" hreflang="en" href=""/>
<id>tag:developerWorks.dw,19 Apr 2011 04:00:00 +0000</id>
<published>12 Aug 2011 04:00:00 +0000</published>
<updated>12 Aug 2011 04:00:00 +0000</updated>
</entry>
<entry>
<title>XMLReader with XPath</title>
<summary>For complex XML document reading and writing,
using XPath with XReader can ease your burden!</summary>
<link rel="self" type="text/html" hreflang="en" href=""/>
<id>tag:developerWorks.dw,19 Apr 2011 04:00:00 +0000</id>
<published>08 Aug 2011 04:00:00 +0000</published>
<updated>08 Aug 2011 04:00:00 +0000</updated>
</entry>
</feed>
MyFeed;

  <strong>
$doc->loadXML($xmlStream);
$xpath = new DOMXpath($doc);
$xslt = new XSLTProcessor();
$xsl = new DOMDocument();
$xsl->load( 'xsl/article_feed.xsl', LIBXML_NOCDATA);
$xslt->importStylesheet( $XSL );
print $xslt->transformToXML( $doc );</strong>
?>

------------------------------------------------------------
OUTPUT:

<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>Using XPath with PHP</title></head><body><table><tr>
<td>Using XPath with PHP</td></tr><tr>
<td><i>"Let XPath do the hard work for you when working with XML",</i></td>
</tr><tr><td>by Tracy Bost
</td></tr><table border="1"><tr><td>Title</td>
<td>SimpleXML & XPath </td></tr>
<tr><td>Summary</td>
<td>If you are using SimpleXML to parse XML or RSS feeds,
 XPath is great to use!</td>
</tr><tr/></table>
<br/>
<table border="1">
<tr><td>Title</td>
<td>DOMXPath</td>
</tr>
<tr><td>Summary</td>
<td>If you are using DOM for traversal XML documents,
give DOMXPath a try! </td>
</tr><tr/></table><br/>
<table border="1">
<tr><td>Title</td><td>XMLReader with XPath</td>
</tr>
<tr><td>Summary</td>
<td>For complex XML document reading and writing,
using XPath with XReader can ease your burden!</td>
</tr><tr/></table><br/>
</table>
</body>
</html>

Notice that Listing 8 has no $xpath->query()statement as demonstrated in Listing 6. All the XPath expressions are located in the XSL file. Just use DOM to import the stylesheet, and then let it perform the transformation!

Summary

In this article, you were introduced to XPath and how you can use it in a PHP5 environment when you work with XML. Like so many other libraries available in PHP, the XML libraries allow you as a developer to focus on your functional requirements rather than the low-level wiring of classes and objects. XPath can help eliminate the cumbersome task of locating and parsing data within XML. Depending on your needs, you can optionally use SimpleXML, DOM, or the XML libraries of a framework, such as the Zend Framework. Luckily, they all work with W3C XPath in a standard fashion. So, when you load that next XML file or data stream, have no fear about navigating to the exact values that you need to process