Import Frickin’ Huge XML Files With XMLReader and SimpleXML

October 25th, 2015 - Posted by Steve Marks to PHP, Web Development.

We work with a lot of estate agent sites in our day-to-day operations. One of the tasks we normally need to incorporate into our building of these sites is importing the agent’s properties from a third party source. Depending on the software the estate agent uses this can be in a variety of formats but is normally in XML.

We’ve recently been having problems when the XML data provided has been quite large so I explain below how we used to import the data, plus the simple changes made to cater for larger files.

Using only SimpleXML

Normally we would use SimpleXML to import data like so:

$xml = simplexml_load_file( '/path/to/data.xml' );

if ($xml !== FALSE) {

    foreach ($xml->property as $property)
    {
        //.. Do stuff here
    }

}

Now, this works fine for smaller XML documents. The issue comes when you start working with larger XML files in the MB’s, or even GB’s, that contain thousands of nodes.

When you use SimpleXML it parses all the nodes into memory meaning you’re likely to hit memory limit issues quite easily.

The Solution – Using XMLReader and SimpleXML

Fortunately we can easily get around the issue with just a few extra lines of code, and no need to install third party libraries etc.

Taking the code above it can be modified to use a combination of XMLReader and SimpleXML like so:

$reader = new XMLReader();

//load the selected XML file to the DOM
$reader->open( '/path/to/data.xml' );

while ($reader->read()) {

    // 'property' in the line below is the name of our main XML node
    if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'property') {

        $property = new SimpleXMLElement($reader->readOuterXML());

        //.. at this point we can use $property exactly the same as we
        //.. would in the SimpleXML example above.
				
    }

}

By using the above, you can now import much larger XML files without fear of hitting memory limits and the like.

Tags: ,
This entry was posted on Sunday, October 25th, 2015 at 10:01 am by +Steve Marks and is filed under PHP, Web Development. You can follow any responses to this entry through the RSS 2.0 feed.
Comments...

Fear not, we won't publish this

Comments (0)

No comments have been left yet. Be the first