login about faq

Hi,

I try to convert this simple html file. It is just a centered bullet symbol. I use the CkHtmlToXmlW class for that purpose.

When using ConvertFile method, I get the "Replacement Character" instead of the bullet. When I use ToXml method, I get the quotation mark instead.

What should I do to convert the bullet right?

I use VS2010 with 9.5.0.47 x86.

Thanks in advance.

asked Mar 03 '15 at 03:33

odavidi's gravatar image

odavidi
11


The HTML file contains this:

<html>
<body>
<center>
•
</center>
</body>
</html>

There needs to be a META tag indicating the utf-8 charset because the bullet char, if you examine the HTML in a hex editor, is composed of 3 bytes in the utf-8 encoding. By not specifying any charset in an HTML meta, the default choice is ANSI and therefore the bytes that compose the bullet are interpreted according to the 1-byte-per-char ANSI encoding of whatever computer you happen to be running on...

link

answered Mar 03 '15 at 12:50

chilkat's gravatar image

chilkat ♦♦
11.8k316358421

I understand.

Earlier, I wanted to make things simpler, so I built this file myself. The real file I try to convert is this. It has a uft-16 encoding. I need the output XML in utf-8 encoding, but I can't find any way to do it with ConvertFile. I tried to put put_XmlCharset(L"utf-8"), but it still doesn't work.

Thanks in advance.

(Mar 08 '15 at 04:24) odavidi
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×59
×20
×3
×2

Asked: Mar 03 '15 at 03:33

Seen: 1,763 times

Last updated: Mar 08 '15 at 04:24

powered by OSQA