login about faq

Is there a ChilKat API anywhere that can look at the BOM header of a string and determine the encoding method. I can easily write this but didn't want to reinvent the wheel

asked Jul 25 at 16:22

chilkat's gravatar image

chilkat ♦♦
12.5k317361456


This question only makes sense if you have bytes. For example, if you have a string in C#, the question make no sense. It would have to be a byte[] in C#.

In C++, if you have a "char ", then you have a pointer to bytes. However, if you have a "wchar_t ", then you have a pointer to byte representing chars in what should be utf-16 (or perhaps utf-32) encoding, depending on the meaning of "wchar_t" by the compiler. In that case, you had better not be pointing to utf-8 bytes.

You're probably only interested in the BOM's for utf-16 (LE and BE), utf-8, and utf-32(LE and BE). Assuming you have a "char " (or "const char "), then it's just a matter of looking at the 1st few bytes.

The utf-8 BOM is 3 bytes: EF BB BF

The other BOM's are listed here: https://en.wikipedia.org/wiki/Byte_order_mark

link

answered Jul 25 at 16:34

chilkat's gravatar image

chilkat ♦♦
12.5k317361456

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×1

Asked: Jul 25 at 16:22

Seen: 77 times

Last updated: Jul 25 at 16:34

Related questions

powered by OSQA