login about faq

Hi, I am writing an application where in I need to crawl a html website & scrape the data from that website. I know I can write a spider using CkSpider class which can crawl the website but I wanted to know how can I scrape data from the website ? Which class can be used to scrape data from the website ? Any help would be highly appreciated. Thank you!!

asked May 06 '14 at 02:37

jo_himan's gravatar image

jo_himan
1144


It depends. If the site being scraped uses XHTML, then each web page is technically XML and you can use any XML parser to help pick out the pieces of information you want. (Chilkat XML is one such XML API that could be used.)

If the site returns HTML, which is typically not valid XML, then you could use the Chilkat HTML-to-XML API to convert the HTML to well-formed XML for programmatic digestion...

link

answered May 06 '14 at 13:13

chilkat's gravatar image

chilkat ♦♦
11.8k316358420

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×20
×3

Asked: May 06 '14 at 02:37

Seen: 1,105 times

Last updated: May 06 '14 at 13:13

powered by OSQA