Friday, 15 January 2010

Scraping website with DOM and XML in PHP -



Scraping website with DOM and XML in PHP -

i trying list of links webpage php. i've tried:

$webpage = file_get_contents('http://cl1.php.net/manual/en/function.call-user-func-array.php'); $dom = new domdocument(); $dom->loadhtml($webpage); $xpath = new domxpath($dom); $links = $xpath->query('aside/ul/li/ul/li/a');//returns nil foreach ($links $link) { echo $link->getattribute('href'); }

the code works until has perform query, when returns empty object.

i've tried solve aforementioned problem:

$dom->getelementsbytagname('aside')->childnodes->item(0)->childnodes->item(0)->childnodes->item(1)->childnodes->item(0)->childnodes->item(0)->childnodes;

i know lastly code doesn't homecoming elements, but, so, doesn't work.

edit:

this part of html:

<aside class='layout-menu'> <ul class='parent-menu-list'> <li> <a href="ref.funchand.php">function handling functions</a> <ul class='child-menu-list'> <li class="current"> <a href="function.call-user-func-array.php" title="call_&#8203;user_&#8203;func_&#8203;array">call_&#8203;user_&#8203;func_&#8203;array</a> </li>

i don't see how query match. using relative query on entire document, in essence doing relative query document root.

try either specify query root node like:

// instantiate domxpath $xpath = new domxpath($dom); // utilize total path hierarchy in query $links = $xpath->query('/html/body/.../aside/ul/li/ul/li/a');

or pass aside node context xpath utilize relative query.

// domnode object aside element $aside_tag = $dom->getelementsbytagname('aside')->item(0); // instantiate domxpath $xpath = new domxpath($dom); // pass domnode context domxpath::query() $links = $xpath->query('ul/li/ul/li/a', $aside_tag);

php xml dom

No comments:

Post a Comment