Archived Forum Post

Index of archived forum posts

Question:

scrape data from html website

May 06 '14 at 13:13

Hi, I am writing an application where in I need to crawl a html website & scrape the data from that website. I know I can write a spider using CkSpider class which can crawl the website but I wanted to know how can I scrape data from the website ? Which class can be used to scrape data from the website ? Any help would be highly appreciated. Thank you!!

Answer

It depends. If the site being scraped uses XHTML, then each web page is technically XML and you can use any XML parser to help pick out the pieces of information you want. (Chilkat XML is one such XML API that could be used.)

If the site returns HTML, which is typically not valid XML, then you could use the Chilkat HTML-to-XML API to convert the HTML to well-formed XML for programmatic digestion...