login about faq

I’m currently using your VC++ Version 6 library. What I want is to be able to convert a web page into a plain text file.

Is it possible to download the html file with the http.Download(http://www.mydomain/xyz.html) call?

I noticed that you have an CkHtmlToText call I can make. How would you suggest feeding it a complete web page? Would I download the html file, then read it in as one big string? Then feed the big string to h2t.toText(bigstring)?

It would be ideal is if there was a way for me to pass in a URL instead of a big string?

Let me know what calls you would suggest that I look at.

Thank you!

asked Mar 22 '13 at 10:29

BlueGramicci's gravatar image

BlueGramicci
1111


The Download method sends an HTTP GET request to fetch the content, whatever it may be, at a specified URL and saves it to a file. The content at URL could be anything -- a JPG, a .zip archive, an HTML page, a Perl script that emits something, a dynamic web page using ASP, JSP, etc. Therefore, downloading HTML is no different than downloading any other type of content. The bytes sent in the body of the HTTP response is what is saved to the output file.

To "download" directly into a byte array, call http.QuickGet.

To "download" directly into a string variable, call http.QuickGetStr.

(Of course, it would only make sense to download text usingi QuickGetStr.)

link

answered Mar 22 '13 at 16:33

chilkat's gravatar image

chilkat ♦♦
11.8k316358420

Thanks! I tried using the HtmlToText function, but the result returns items like href URL's instead of the Alt tag. I was hoping that it would return the same thing as if you copied the whole web page into the clipboard and pasted it into a notepad file. Do you have any suggestions if what I want is the equivalent text as if you did a copy/paste from a whole web page to a notepad file?

link

answered Mar 22 '13 at 20:02

BlueGramicci's gravatar image

BlueGramicci
1111

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×6
×4
×2
×1

Asked: Mar 22 '13 at 10:29

Seen: 986 times

Last updated: Mar 22 '13 at 20:02

powered by OSQA