Grabbing content (aka. page scraping) from a website - Webmaster Forum

04-18-2011, 12:44 AM

360

Status: I'm new around here
Join date: Sep 2010
Location: Australia
Expertise: Design
Software: Adobe Photoshop

Posts: 8

iTrader: 0 / 0%

360 is on a distinguished road

Grabbing content (aka. page scraping) from a website

As an alternative to using an XML feed (possibly if a website doesnt offer any feeds) you can use the following method to load the website into PHP & then grab certain content:

Code:

// Create DOM from URL or file 
$html = file_get_html('http://www.google.com/');  

// Find all images 
foreach($html->find('img') as $element)
echo $element->src . '<br>'; 

// Find all links
foreach($html->find('a') as $element) 
echo $element->href . '<br>';

Note that this is not my original code, i've sourced this through google searches. Although it's definately handy so i wanted to share it with you all.

Cheers,

$0.08

Thanked by 2 users:

DDS (04-18-2011), rocoso (06-09-2011)

06-19-2011, 05:33 AM

scriptbazaar

Status: I'm new around here
Join date: Nov 2008
Location:
Expertise:
Software:

Posts: 12

iTrader: 0 / 0%

scriptbazaar is on a distinguished road

Thanks for sharing his is really very handy.

06-19-2011, 10:12 PM

Wildhoney

Status: Request a custom title
Join date: Feb 2006
Location: Nottingham
Expertise:
Software:

Posts: 1,648

iTrader: 18 / 95%

Wildhoney is on a distinguished road

Thank you. In extension to this, I see many people use regular expressions to scrape content from websites, when really XPath is the way to be heading. Especially since DOMDocument provides a DOMXPath class.

As always, W3Schools covers XPath nicely.