Archived Forum Post

Index of archived forum posts

Question:

Send Ansi (utf-8) email?

Feb 20 '13 at 14:54

For sending email with the Chilkat email libraries, what are the minimum settings to send email "as is". This is for a C++ app that is NOT Unicode so all strings (body, subject) are passed as Ansi strings.

Converting the app to Unicode will be a major task (for a future release).

For the moment, I want to send emails from this app as Ansi, but the Chilkat library should convert it to an utf-8 email (so all receivers see the message properly). I guess 99% of email clients understand utf-8 emails (is this true?).

I hope it is clear what I need. How can I make sure
email.putSubject("söme stránge non-ascii chàrs here");
will work as expected. The minimum "email.putXxxx" steps to take to support all (most) languages.


Answer

There are two separate issues that must be understood:

1) Passing a null-terminated string correctly to a Chilkat method or property.
and
2) Deciding what charset is to be used for the email when sent.

Passing a null-terminated string correctly to a Chilkat method or property.

The Chilkat C++ API is a class library where the strings are passed as "const char ". (There is also a Unicode version of each class. For example, CkEmail is the multibyte class that uses "const char ", and CkEmailW is the Unicode version that uses "const wchar_t *").

A "const char *" is a pointer to a multibyte character string. This means it's a string that uses a multibyte character encoding and is null-terminated. A multibyte character encoding, such as iso-8859-1, utf-8, etc. is a character encoding such that each char is represented by one or more bytes, and there are no NULL (0) bytes except for the null terminator. ANSI is a term used for the default multibyte character encoding for the locale of the computer. For computers in the USA it would be either iso-8859-1 or Windows-1252. (also known as Latin1)

By default, Chilkat interprets the bytes of a "const char *" argument as characters using the ANSI character encoding. If however, you are going to pass a utf-8 string to a Chilkat method, you must first tell the object instance that it should interpret the bytes as utf-8. Do this by setting the "Utf8" property to true for the object instance:

emailObject.put_Utf8(true);
The Utf8 boolean property is common to all Chilkat C++ classes.

So that takes care of #1 -- given that utf-8 is the multibyte encoding for Unicode, you can pass strings containing characters of any language to a Chilkat method. (or you could switch to the Chilkat Unicode classes where the class names end in a "W" and use "const wchar_t *").

Deciding what charset is to be used for the email when sent.

The Email object's Charset property determines the character encoding that will be used for an email. This is a completely separate issue than passing ANSI or utf-8 multibyte strings to Chilkat methods.

If left unset, then Chilkat will automatically determine the most appropriate charset. If, for example, all chars are 7bit us-ascii, Chilkat will use us-ascii and (I think) no charset is explicitly specified in the MIME of the email. Although, if us-ascii is explicitly specified, there is no harm in it. If non-us-ascii characters are present, then Chilkat will examine the characters and will know what languages are represented, and will automatically choose the most appropriate charset -- choosing the typical ANSI charset for cases where only one language (in addition to us-ascii) is present. For example, if the email only contains us-ascii and Japanese chars, then Chilkat will automatically choose Shift-JIS. If the email contain a mix of languages such that no one charset can be chosen, or if the language cannot be determined without ambiguity, then the "utf-8" charset will be chosen.

If your app explicitly sets the CkEmail's "charset" property (i.e. get_Charset/put_Charset), then it will be used when the email is "rendered" and sent via the SendEmail method.


Answer

Do I understand correctly...

I can leave my program unchanged where it is building the email (building it with "const char*" multibyte ansi strings) and Chilkat will use the locale (character set) of the computer to determine how to interpret those non-us ascii chars? Using the locale of the current computer is important, because (I make this up, but you get the idea) the character ü for the German locale has the same ascii code as ä for the Danish locale... Multi language emails are not an issue for this program.

So I can leave my program unchanged (where it builds the email) and just use put_charset("utf-8) when sending it, thereby making sure that both German and Danish receivers will see "ü".

ps. Yes, I know.... I should make the program 100% Unicode (and utf-8). For the moment I am looking for a quick fix that helps the majority of my customers.