Archived Forum Post

Index of archived forum posts

Question:

Unable to get links of the source domain - Spider

Mar 12 '14 at 13:46

Hi I'm new to Chilkat as we as C#.

I'm trying to crawl my own website http://www.sh3lls.net to get all the links on the site including outbound links to other domains as well as other pages on my website from the same website, i'm working on a project so to avoid any kind of TOS using my own website, but when i crawl i'm able to get all links of other domains, but i don't get any links from my own domain www.sh3lls.net, below is the source code:

public string url;

Chilkat.StringArray extractURL(string url) { bool success; int i; string url2; url = txtURL.Text;

        Chilkat.StringArray urlList = new Chilkat.StringArray();

        Chilkat.Spider crawl = new Chilkat.Spider();
        crawl.Initialize(url);
        crawl.AddUnspidered(url);
        success = crawl.CrawlNext();

        urlList.Unique = true;
        urlList.Clear();
        for (i = 0; i <= crawl.NumOutboundLinks- 1; i++)
        {
            url2 = crawl.GetOutboundLink(i);
            //MessageBox.Show(crawl.CanonicalizeUrl(url));
            //MessageBox.Show(crawl.CanonicalizeUrl(url2));

            MessageBox.Show(url);
            MessageBox.Show(url2);

            if (url2.Contains(url))
            {
                MessageBox.Show("works");
                urlList.Append(url2);
                txtLog.Text += "Found New Page to Save " + url2 + "\r\n";
                if (crawl.LastFromCache != true)
                {
                    crawl.SleepMs(1000);
                }

            }
        }

        return urlList;

    }

Kindly help how to get links of the same domain

Thanks

Answer

Hi Sorry but i think i found the problem, the reason is the way in which the links are added in my site, not an issue with chilkat

Thanks