Archived Forum Post

Index of archived forum posts

Question:

What is the encoding of the paths in CkZip and CkFileAccess?

Oct 06 '16 at 11:45

Whenever I use CkZip and CkFileAccess I wonder what is the enconding of the const char * behind methods like:

bool OpenForRead(const char *filePath);
bool OpenZip(const char *ZipFileName);

I want to make sure that on Windows that translates into a Unicode path, but I am not sure how Chilkat handles these.

What is the expected encoding of the input and does Chilkat convert to Unicode/UTF-16 underlying on Windows? Does it use the W (CreateFileW) versions of the APIs when opening files on Windows?


Answer

Answering my own question: just remembered about the Unicode versions of CkZip and CkFileAccess: CkZipW and CkFileAccessW

A reminder on the doc page of CkZip and CkFileAccess about their Unicode versions would probably be a good addition to the documentation.


Answer

Bogdan,

For the multibyte Chilkat C/C++ API (i.e. "CkZip" instead of "CkZipW"), strings are passed as "const char *", and the lower case alternative methods (those that return strings) return "const char *".

By default, "const char *" is assumed to point to ANSI bytes (i.e. chars that use the ANSI encoding, which is typically a 1-byte per char encoding).

All Chilkat classes, except for CkString, have a Utf8 property (get_Utf8(), put_Utf8(bool value)) This controls how Chilkat is to interpret the bytes of a "const char *" -- as utf-8 or ANSI. By default, the Utf8 property is false. If Utf8 is true, then "const char *" inputs are interpreted as utf-8, and "const char *" returned by the lowercase string method alternatives will be in utf-8.

See http://cknotes.com/utf8-c-property-allows-for-utf-8-or-ansi-const-char/

Also, an application can set the ANSI/utf-8 behavior globally by setting the CkGlobal's DefaultUtf8 property.

Finally: Internally, Chilkat will use Unicode (such as CreateFileW) to open files where the path contains any non-us-ascii chars. If the path is entirely 7bit us-ascii, Chilkat is free to use either CreateFileA or CreateFileW, it really doesn't matter. This applies to not just CreateFileW, but any Microsoft Platform SDK method that has ANSI(A) or Unicode(W) alternatives.


Answer

Thank you for the quick reply, Matt!

Finally: Internally, Chilkat will use Unicode (such as CreateFileW) to open files where the path contains any non-us-ascii chars. If the path is entirely 7bit us-ascii, Chilkat is free to use either CreateFileA or CreateFileW, it really doesn't matter. This applies to not just CreateFileW, but any Microsoft Platform SDK method that has ANSI(A) or Unicode(W) alternatives.

Does this mean that on Windows Chilkat decides whether to use CreateFileA or CreateFileW instead of always using CreateFileW?

I ask because all ANSI(A) alternatives of the Microsoft Platform SDK internally convert strings from ANSI to Unicode and then call the Unicode(W) version of the API, so I wonder why it makes sense for Chilkat to ever call the ANSI(A) versions.

For instance, here is how CreateFileA is implemented:

uf /c KERNELBASE!CreateFileA
KERNELBASE!CreateFileA
  KERNELBASE!CreateFileA+0x1f:
    call to KERNELBASE!Basep8BitStringToDynamicUnicodeString
  KERNELBASE!CreateFileA+0x58:
    call to KERNELBASE!CreateFileW
  KERNELBASE!CreateFileA+0x65:
    call to ntdll!RtlFreeAnsiString

Answer

The older Windows Mobile (CE) builds don't have the Unicode Platform SDK functions available, so like I said.. Chilkat can choose one or the other. The choice depends on what Chilkat already has. If it has utf-8 or utf-16 in hand, then it might choose CreateFileW. If it has ANSI in hand, the it makes no difference whether Chilkat first coverts to Unicode and then calls CreateFileW, or whether it just calls CreateFileA and lets Microsoft convert.

I don't like to discuss internals because I don't have the time to defend the internal implementation. It is a mature implementation that has evolved over the last 16 years...


Answer

Thank you for the details, Matt. No need to defend the internal implementation, I trust Chilkat to do the right thing here. I was just curious and willing to learn something new about the reasoning behind it. I found on several occasions that understanding how Chilkat works helped with using it the way it was intended to, thus making the best use of these excellent components.