Archived Forum Post

Index of archived forum posts

Question:

how to crawl website ? Scrape data ? Convert to xml ?

May 09 '14 at 07:12

I have written a spider using CkSpider. This is my code.

include<ckspider.h>

include<ckstringarray.h>

include<ckstring.h>

include <stdio.h>

void main(void) {

CkSpider spider;
CkStringArray seenDomains;
CkStringArray seedUrls;
const char * baseUrl = "http://www.xyz.com/mobile.aspx?article=yes&pageid=1&sectid=edid=&edlabel=TOIPU&mydateHid=09-05-2014&pubname=Times+of+India+-+Pune&edname=&articleid=Ar00102&publabel=TOI";
const char * domain;
spider.Initialize(baseUrl);

if(spider.CrawlNext())
{
    printf("\n %s", spider.domain());
    printf("\n %s", spider.lastHtml());
}
else
{
    printf("\n not crawled.");
}

}

I need to scrape data from a website & also convert it in XML form any suggestions ?