Today's Posts Follow Us On Twitter! TFL Members on Twitter  
Forum search: Advanced Search  
Navigation
Marketplace
  Members Login:
Lost password?
  Forum Statistics:
Forum Members: 24,256
Total Threads: 81,169
Total Posts: 566,839
There are 52 users currently browsing (tf).
 
  Our Partners:
 
  TalkFreelance     Design and Development     Programming     PHP and MySQL :

Grabbing content (aka. page scraping) from a website

Thread title: Grabbing content (aka. page scraping) from a website
Reply    
    Thread tools Search this thread Display Modes  
04-18-2011, 12:44 AM
#1
360 is offline 360
360's Avatar
Status: I'm new around here
Join date: Sep 2010
Location: Australia
Expertise: Design
Software: Adobe Photoshop
 
Posts: 8
iTrader: 0 / 0%
 

360 is on a distinguished road

  Old  Grabbing content (aka. page scraping) from a website

As an alternative to using an XML feed (possibly if a website doesnt offer any feeds) you can use the following method to load the website into PHP & then grab certain content:

Code:
// Create DOM from URL or file 
$html = file_get_html('http://www.google.com/');  

// Find all images 
foreach($html->find('img') as $element)
echo $element->src . '<br>'; 

// Find all links
foreach($html->find('a') as $element) 
echo $element->href . '<br>';
Note that this is not my original code, i've sourced this through google searches. Although it's definately handy so i wanted to share it with you all.

Cheers,

Thanked by 2 users:
DDS (04-18-2011), rocoso (06-09-2011)
06-19-2011, 05:33 AM
#2
scriptbazaar is offline scriptbazaar
Status: I'm new around here
Join date: Nov 2008
Location:
Expertise:
Software:
 
Posts: 12
iTrader: 0 / 0%
 

scriptbazaar is on a distinguished road

  Old

Thanks for sharing his is really very handy.

Reply With Quote
06-19-2011, 10:12 PM
#3
Wildhoney is offline Wildhoney
Wildhoney's Avatar
Status: Request a custom title
Join date: Feb 2006
Location: Nottingham
Expertise:
Software:
 
Posts: 1,648
iTrader: 18 / 95%
 

Wildhoney is on a distinguished road

Send a message via AIM to Wildhoney Send a message via MSN to Wildhoney Send a message via Yahoo to Wildhoney

  Old

Thank you. In extension to this, I see many people use regular expressions to scrape content from websites, when really XPath is the way to be heading. Especially since DOMDocument provides a DOMXPath class.

As always, W3Schools covers XPath nicely.

Reply With Quote
Reply    


Thread Tools
Display Modes

  Posting Rules  
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump:
 
  Contains New Posts Forum Contains New Posts   Contains No New Posts Forum Contains No New Posts   A Closed Forum Forum is Closed