Archived Forum Post

Index of archived forum posts

Question:

How to Send utf-8 Strings with SendString?

Feb 20 '13 at 14:01

I'm getting an error on the server side (it doesn't recognize the message) when SendString is used and I notice that there is no size limit for the string, so any size should be ok, so I'm guessing that the only other thing that could be causing the problem is that the default ANSI for the string is not recognized.


Answer

There are two separate issues:

1) Passing a null-terminated string correctly to a Chilkat method or property. and 2) Deciding what charset is to be used for sending on the socket.

Passing a null-terminated string correctly to a Chilkat method or property.

The Chilkat C++ API is a class library where the strings are passed as "const char ". (There is also a Unicode version of each class. For example, CkSocket is the multibyte class that uses "const char ", and CkSocketW is the Unicode version that uses "const wchar_t *").

A "const char *" is a pointer to a multibyte character string. This means it's a string that uses a multibyte character encoding and is null-terminated. A multibyte character encoding, such as iso-8859-1, utf-8, etc. is a character encoding such that each char is represented by one or more bytes, and there are no NULL (0) bytes except for the null terminator. ANSI is a term used for the default multibyte character encoding for the locale of the computer. For computers in the USA it would be either iso-8859-1 or Windows-1252. (also known as Latin1)

By default, Chilkat interprets the bytes of a "const char *" argument as characters using the ANSI character encoding. If however, you are going to pass a utf-8 string to a Chilkat method, you must first tell the object instance that it should interpret the bytes as utf-8. Do this by setting the "Utf8" property to true for the object instance:

socket.put_Utf8(true);

The Utf8 boolean property is common to all Chilkat C++ classes.

So that takes care of #1 -- given that utf-8 is the multibyte encoding for Unicode, you can pass strings containing characters of any language to a Chilkat method. (or you could switch to the Chilkat Unicode classes where the class names end in a "W" and use "const wchar_t *").

How to Send a Strong over a Socket using the Correct Charset

To send a string over a socket (not including the null terminator) an application would call the socket.SendString method, passing the string to be sent. One thing to realize is that the Chilkat API is offered in many different programming languages, and in some languages strings are passed as Unicode, in some languages they are passed as string objects (also essentially Unicode), and in some they are passed as null-terminated byte sequences. Given that the receiving end would be expecting bytes representing characters according to a specific character encoding, it is required that the sender be able to control the byte representation of the characters in the string being sent. This is done by setting the "StringCharset" property. If the StringCharset property is set to "utf-8", then the characters are sent using the utf-8 encoding. For example:

socket.put_StringCharset("utf-8");
success = socket.SendString(myString);

It makes no difference whether ANSI or utf-8 is used to pass a string to SendString. As long as it's passed correctly, then Chilkat will automatically do the conversion (if necessary) to send the string using the character encoding specified by StringCharset.