Blog : How parse HTML in PHP?

How parse HTML in PHP?

I know we can use PHP DOM to parse HTML using PHP. I found lot of questions here on stackoverflow too. But I have a specific requirement. I have an HTML content like below


  Chapter 1



  This is chapter 1



  Chapter 2



  This is chapter 2



  Chapter 3



  This is chapter 3


I want to parse the above HTML and save the conent into two different array like
$heading and $content
$heading = array('Chapter 1','Chapter 2','Chapter 3');
$content = array('This is chapter 1','This is chapter 2','This is chapter 3');
I can achieve this simply using jQuery. But I am not sure, is it the right way. It would be great if some can point me to right direction. Thanks in advance.
I have used domdocument and domxpath to get the solution, you can find it at:
$dom = new DomDocument();
$test='


  Chapter 1



  This is chapter 1



  Chapter 2



  This is chapter 2



  Chapter 3



  This is chapter 3

';

$dom->loadHTML($test);
$xpath = new DOMXpath($dom);
  $heading=parseToArray($xpath,'Heading1-H');
  $content=parseToArray($xpath,'Normal-H');

var_dump($heading);
echo "
";
var_dump($content);
echo "
";

function parseToArray($xpath,$class)
{
  $xpathquery="//span[@class='".$class."']";
  $elements = $xpath->query($xpathquery);

  if (!is_null($elements)) {  
  $resultarray=array();
  foreach ($elements as $element) {
  $nodes = $element->childNodes;
  foreach ($nodes as $node) {
  $resultarray[] = $node->nodeValue;
  }
  }
  return $resultarray;
  }
}
One option for you is to use DOMDocument and DOMXPath. The do require a bit of a curve to learn, but once you do, you will be pretty happy with what you can achieve.
Read the following in php.net
http://php.net/manual/en/class.domdocument.php
http://php.net/manual/en/class.domxpath.php
Hope this helps.
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
  echo $element->src . '
';
// Find all links
foreach($html->find('a') as $element)
  echo $element->href . '
';