login about faq

Hello, I'm using the HtmlToText to convert HTML to plain text. The component is treating angle brackets used as mathematical symbols as HTML though. For example: "before less than < after less than" becomes "before less than". The value of the SuppressLinks property didn't have an effect.

I did turn on verbose logging, but it didn't give me anything I could use. ToText: DllDate: Aug 15 2013 ChilkatVersion: Username: IUSR Architecture: Little Endian; 32-bit Language: .NET 2.0 VerboseLogging: 1 decodeHtmlEntities: 1 HtmlCodePage: 65001 charset3: utf-8 toXmlTime: Elapsed time: 0 millisec xmlToText: recursiveToText: (leaveContext) (leaveContext) toTextTime: Elapsed time: 16 millisec Success. (leaveContext)

Any suggestions? Is this a known issue?

Thanks for you help.

asked Aug 28 '13 at 17:48

kfitting's gravatar image


edited Aug 28 '13 at 17:48

When parsing HTML, the "<" character is interpreted as the open character of an HTML tag. Therefore, when an unencoded "<" exists, such as in "before less than < after less than" the HTML parser things that the HTML tag is "<afterlessthan...."

As a human we can look at it and obviously know that the "<" character in that case is a mistake. However, programmatically it is not so easy. There is no way to really encounter a "<" and decide to NOT interpret it as the start character for an HTML tag.


answered Aug 29 '13 at 09:30

chilkat's gravatar image

chilkat ♦♦

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here



Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported



Asked: Aug 28 '13 at 17:48

Seen: 2,141 times

Last updated: Aug 29 '13 at 09:30

powered by OSQA