Archived Forum Post

Index of archived forum posts

Question:

Bullet symbol not properly being converted from HTML to XML

Mar 08 '15 at 04:24

Hi,

I try to convert this simple html file. It is just a centered bullet symbol. I use the CkHtmlToXmlW class for that purpose.

When using ConvertFile method, I get the "Replacement Character" instead of the bullet. When I use ToXml method, I get the quotation mark instead.

What should I do to convert the bullet right?

I use VS2010 with 9.5.0.47 x86.

Thanks in advance.


Answer

The HTML file contains this:

<html>
<body>
<center>
•
</center>
</body>
</html>

There needs to be a META tag indicating the utf-8 charset because the bullet char, if you examine the HTML in a hex editor, is composed of 3 bytes in the utf-8 encoding. By not specifying any charset in an HTML meta, the default choice is ANSI and therefore the bytes that compose the bullet are interpreted according to the 1-byte-per-char ANSI encoding of whatever computer you happen to be running on...