Archived Forum Post

Index of archived forum posts

Question:

How to Encrypt Unicode Strings?

Nov 14 '12 at 17:29

I tried to run your AES encryption examples with Delphi XE2.

encStr := CkCrypt2__encryptStringENC(crypt,
    'The quick brown fox jumps over the lazy dog.');
As long as the string to encrypt does contain normal ascii characters everything is OK.
But the input string is of type “pwidechar” so It can hold unicode characters.

If I change your example to

encStr := CkCrypt2__encryptStringENC(crypt,
    'The quick brown fox jumps over the lazy dog.'
    + widechar(1602));
The result is empty. Widechar(1602) is an Arabic letter.

How can I encrypt and decrypt strings containing Unicode ???


Answer

See the online reference documentation for EncryptStringENC:

Encrypts a string and returns the encrypted data as an encoded (printable) string. The minimal set of properties that should be set before encrypting are: CryptAlgorithm, SecretKey, Charset, and EncodingMode. Other properties that control encryption are: CipherMode, PaddingScheme, KeyLength, IV. When decrypting (with DecryptStringENC), all property settings must match otherwise garbled data is returned. The Charset property controls the exact bytes that get encrypted. Languages such as VB.NET, C#, and Visual Basic work with Unicode strings, thus the input string is Unicode. If Unicode is to be encrypted (i.e. 2 bytes per character) then set the Charset property to "Unicode". To implicitly convert the string to another charset before the encryption is applied, set the Charset property to something else, such as "iso-8859-1", "Shift_JIS", "big5", "windows-1252", etc. (Refer to EncryptString for the complete list of charsets.)

The EncodingMode property controls the encoding of the string that is returned. It can be set to "Base64", "QP", or "Hex".

Encryption algorithms work on bytes. The question is: What byte representation of the characters do you want to encrypt? Do you want to encrypt the ANSI character encoding? utf-8? or 2-byte/char Unicode? The Charset property controls this behavior. The default value of the Charset property is the ANSI charset, whatever it may be on the computer where your code is running. Given that you're not running on an Arabic computer, the ANSI charset is not going to be able to represent an Arabic character, and therefore the internal conversion from the Unicode string (passed from Delphi) to the ANSI byte representation fails, and the return value is empty. You can check the contents of the LastErrorText to confirm.

The solution is to set the Charset property equal to a charset that can handle the chars. "utf-8" is one possible choice. ("utf-8" is the multibyte encoding for Unicode.) You can also set the Charset property = "Unicode", or perhaps "Windows-1256" to handle single-byte/char us-ascii + Arabic.

Also, see this for more information: http://www.chilkatforum.com/questions/316/utf-8-characters-not-encrypting