PHP developers commonly require the services of an Extensible Markup Language (XML) parser in their code. Along these lines, they frequently find it necessary to validate XML input. Fortunately, you can easily accomplish this in PHP. This article shows you how to validate XML documents within PHP and determine the cause of validation failures.
XML is a markup language that enables you, as a developer, to create your own custom language. This language is then used to carry, but not necessarily display, data in a platform-independent fashion. The language is defined with the use of markup tags, much like Hypertext Markup Language (HTML).
XML has gained in popularity in recent years because it represents the best of two worlds: It is easily readable by humans and computers alike. XML languages are expressed in tree-like structure with elements and attributes describing key data. The element and attribute names are usually written in plain English (so humans can read them). They are also highly structured (so computers can parse them).
Now, for example, suppose you create your own XML language, called LuresXML. LuresXML simply defines a means for defining various types of lures that are offered on your Web site. First, you create an XML schema that defines what the XML document should look like, as in Listing 1.
<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
<xs:element name="lures">
<xs:complexType>
<xs:sequence>
<xs:element name="lure">
<xs:complexType>
<xs:sequence>
<xs:element name="lureName" type="xs:string"/>
<xs:element name="lureCompany" type="xs:string"/>
<xs:element name="lureQuantity" type="xs:integer"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
This is, quite intentionally, a fairly simple example. The root element is
called lures
. It is the parent element of one or more
lure
elements, each of which is the parent of three other elements.
The first element is the lure name (lureName
). The second element
is the name of the company that manufactures the lure
(lureCompany
). And, finally, the last element is the quantity
(lureQuantity
), or how many lures your company has in inventory.
The first two of these child elements are defined as strings, whereas the
lureQuantity
element is defined as an integer.
Now, say you want to create an XML document (sometimes called an instance) based on that schema. It might look something like Listing 2.
<lures>
<lure>
<lureName>Silver Spoon</lureName>
<lureCompany>Clark</lureCompany>
<lureQuantity>Seven</lureQuantity>
</lure>
</lures>
This is a simple XML document instance of the schema from Listing 1. In this case, the document instance lists only
one lure. The name of the lure is Silver Spoon
. The manufacturing
company is Clark
. And the quantity on hand is
Seven
.
Here is the question: How do you know that the XML document in Listing 2 is a proper instance of the schema defined in Listing 1? In fact, it isn't (this is also intentional).
Note the lureQuantity
element as defined in Listing 1. It is of type xs:integer
. Yet in Listing 2 the lureQuantity
element actually
contains a word (Seven
), not an integer.
The purpose of XML validation is to catch exactly those kinds of errors. Proper validation ensures that an XML document matches the rules defined in its schema.
Continuing with this example, when you attempt to validate the XML document
in Listing 2, you get an error. You fix this error (by
changing the Seven
to a 7
) before using the document
within your software application.
XML validation is important because you want to catch errors as early as possible in the information interchange process. Otherwise, unpredictable results can occur when you attempt to parse an XML document and it contains invalid data types or an unexpected structure.
It is beyond the scope of this article to provide an exhaustive overview of parsing XML documents in PHP. However, I look at the basics of loading an XML document in PHP.
Just to continue to keep things simple, keep using the schema from Listing 1 and the XML document from Listing 2. Listing 3 demonstrates some basic PHP code to load the XML document.
<?php
$xml = new DOMDocument();
$xml->load('./lures.xml');
?>
Nothing
DOMDocument
class to load the XML document, here called lures.xml.
Note that for this code to work on your own PHP server, the lures.xml file must
reside on the same path as the actual PHP code.
At this point, it is tempting to start parsing the XML document. However, as you have seen, it is best to first validate the document to ensure that it matches the language specifications set forth in the schema.
Continue adding to the PHP code in Listing 3 by inserting some simple validation code, as in Listing 4.
Once again, note that the schema file from Listing 2 must be in the same directory where the PHP code is located. Otherwise, PHP returns an error.
This new code invokes the schemaValidate
method against the
DOMDocument
object that loaded the XML. The method accepts one
parameter: the location of the XML schema used to validate the XML document. The
method returns a Boolean where true
indicates a successful
validation and false
indicates an unsuccessful validation.
Now, deploy the PHP code from Listing 3 to your own PHP server. Call it testxml.php because that is the name given in Listings 3 and 4. Ensure that the XML document (from Listing 2) and XML schema (from Listing 1) are both in the same directory. Once again, PHP reports an error if this is not the case.
Point your browser to testxml.php. You should see one simple word on the screen: "invalid."
The good news is that the schema validation is working. It should return an error, and it did.
The bad news is that you have no idea where the error is located within the XML document. Okay, you might know because I mentioned the source of the error earlier in the article. But pretend that didn't happen, okay?
To repeat: The bad news is that you have no idea where the error is located
within the XML document. Just play along. It would be nice if the PHP code
actually reported the location of the error, as well as the nature of the error,
so that you can take corrective action. Something along the lines of "Hey! I
can't accept a string for lureQuantity
" would be nice.
To view the error that was encountered, you can use the
libxml_get_errors()
function. Unfortunately, the text output of
that function doesn't specifically identify where in the XML document the error
occurred. Instead, it identifies where in the PHP code an error was encountered.
Because that's fairly useless, you look at another option.
There is another PHP function called
libxml_use_internal_errors()
. This function accepts a Boolean as
its only parameter. If you set it to true, then that means that you are
disabling the libxml error reporting and fetching the errors on your own. That's
what you do.
Of course, that means that you have to write a bit more code. But the trade-off is more specific error reporting. In the long run, this saves a lot of time.
Listing 5 shows the finished product.
First, notice the function at the top of the code listing. It's called
libxml_display_error()
and accepts a LibXMLError
object as its only parameter. Then it uses the all-too-familiar switch statement
to determine the error level and craft an error message appropriate to that
level. When the level is determined, the code produces a string that reports the
appropriate level.
Then, two more things happen. First, the error object is examined to
determine whether or not a file
property contains a value. If so,
then that file
value is appended to the error message so the
location of the file is reported. Next, the line
property is
appended to the error message so the user can see exactly where in the XML file
the error occurred. Needless to say, this is extremely important for debugging
purposes.
It should also be noted that libxml_display_error()
simply
produces a string that describes the error. The actual printing of the error to
the screen is left up to the caller, in this case
libxml_display_errors()
.
The function below that is the previously mentioned
libxml_display_errors()
, which takes no parameters. The first thing
this function does is call libxml_get_errors()
. This returns an
array of LibXMLError
objects that represent all of the errors
encountered when the schemaValidate()
method was invoked on the XML
document.
Next, you step through each of the errors you encountered and invoke the
libxml_display_error()
function for each error object. Whatever
string is returned by that function is then printed to the screen. One great
benefit of handling errors this way is that all of the errors are
printed at once. This means that you only need to execute the code once to view
all of the errors specific to that particular XML document.
Finally, libxml_clear_errors()
clears out the errors recently
encountered by the schemaValidate()
method. This means that if
schemaValidate()
is executed again within the same code sequence,
you will start with a clean slate, and only new errors will be reported. If you
don't do this and you execute schemaValidate()
again, then all of
the errors from the first invocation of schemaValidate()
remain in
the array returned by libxml_get_errors()
. Obviously, that presents
problems if you're looking for a fresh set of errors.
It's also important to note that I made a slight change to the if-then
statement at the bottom of the code in Listing 5. If an
error is encountered, it prints "Errors Found!" in bold and then invokes the
aforementioned libxml_display_errors()
function which displays all
of the errors encountered before clearing out the error array. I opted for this
solution instead of just printing out "invalid" as I did in Listing 4.
Now, it's time to test again. Move the PHP file from Listing 5 to your PHP server. Keep the file name the same (testxml.php). As before, ensure that both the XML Schema Definition (XSD) file and the XML files are in the same directory as the PHP file. Point your browser to testxml.php once again, and now you should see something like this:
Errors Found!
Error 1824: Element
'lureQuantity': 'Seven' is not a valid value of the atomic type 'xs:integer'. in
/home/thehope1/public_html/example.xml on line 5
Well, that's fairly descriptive, isn't it? The error message tells you on what line the error occurred. It also tells you where the file is (as if you didn't know). And it tells you exactly why the error occurred. That's information you can use.
You can now leave the PHP file alone and work on fixing the problem in your XML document.
Because the error reportedly occurred on line 5 of the XML document, it's a
good idea to look at line 5 and see what's there. Unsurprisingly, line 5 is the
location of the lureQuantity
element. And, as you look at it
carefully, you suddenly have an epiphany that Seven
is a string,
not a number. So you change the string Seven
to the numeral
7
. The final copy of the XML document should look like Listing 6.
<lures>
<lure>
<lureName>Silver Spoon</lureName>
<lureCompany>Clark</lureCompany>
<lureQuantity>7</lureQuantity>
</lure>
</lures>
Now, copy this new XML file to your PHP server. And, once again, point your browser to testxml.php. You should see just one word: "validated." This is excellent news for two reasons. First, it means that the validation code is working properly because the XML document is, in fact, valid. Second, you have probably just validated your first XML document in PHP. Congratulations!
As I always advise, now it is time to tinker. Modify lures.xsd to make it a more complex schema. Modify lures.xml to make it a more complex instance of that schema. Copy those files to the PHP server and, once again, execute testxml.php. See what happens. Intentionally produce an invalid document for several reasons and see what happens.
Also, note that when you tinker, you don't need to change the PHP code at all. Just make sure that the file names (lures.xml and lures.xsd) are the same and you can modify them to your heart's content.
PHP makes it easy for developers to validate XML documents. Using the
DOMDocument
class in conjunction with the
schemaValidate()
method, you can ensure that your XML documents
comply with the specifications in their respective schemas. This is important to
ensure data integrity in your software applications.
Copyright © 2011 - All Rights Reserved - Softron.in
Template by Softron Technology