What is the difference between getSizeAnsi, getSizeUnicode, and getSizeUtf8 in CkString? They all seem to return size of the string in bytes, so why have 3 different functions that seem to return the same value? The documentation isn’t very clear on the differences from what I’ve found.
asked Mar 01 '13 at 10:27
These methods return the size, in bytes, of the string when the characters are represented in each of the respective encodings: ANSI, Unicode (utf-16), and utf-8.
Consider this character: É
In the iso-8859-1 or Windows-1252 character encoding, it is represented by a single byte: 0xC9
In the utf-8 character encoding, it is represented by a two bytes: 0xC3 0x89
In the utf-16 character encoding, it is represented by a two bytes: 0x00 0xC9
What is the ANSI Charset?
The ANSI charset is the default multibyte charset for a given computer. The ANSI charset (or code page) depends on the locale of the computer. For German computers it might be Window-1252, for Japanese computers it may be Shift_JIS
What is a MultiByte Charset?
Generally, all charsets except Unicode (2-bytes/char, also known as utf-16) are called "multibyte". This includes us-ascii. Some multibyte charsets represent characters in a single byte, others represent characters in variable lengths of bytes. One example is utf-8, which is the multibyte encoding for Unicode. (Google's search result pages use utf-8.)
answered Mar 01 '13 at 10:30