I have a test string: "ąćęłńśóźżĄĆĘŁŃŚÓŹŻ" and I want to send it by SendString function. I can correctly send it from C# to PHP. But why I am not able to send it between PHP and PHP? I have set up two machines, same configuration (one is a clone of another). And it work almost great - almost because the 'Ł' is lost every time.
What can be wrong? In PHP string is not an object, is might be it? Do you have any clues what should I do with it?
asked Sep 19 at 09:00
First one must understand this: http://php.net/manual/en/language.types.string.php#language.types.string.details
There are two hurdles that need to be cleared in order to get things right:
(1) Given that in PHP a string is just an array of bytes, when the bytes are passed to Chilkat, Chilkat must know how to interpret the bytes. Are they utf-8 bytes such that an "á" is represented by "\xC3\xA1", or are they ANSI bytes, where the ANSI character encoding is defined by the locale of the machine, which is likely iso-8859-2 if the computer is in Poland, and in this case the "á" is represented by "\xE1".
For programming languages where strings are byte arrays, Chilkat provides a "Utf8" property that defaults to false/0. The Utf8 property defines how Chilkat is going to interpret the bytes of passed-in string arguments -- as either ANSI bytes or utf-8 bytes. This must be set correctly.
In your particular case, given that only the 'Ł' is lost, it must be that the string is passed in correctly, and the problem occurs in (2) described below.
(2) The 2nd hurdle that must be cleared is that Chilkat must know exactly what bytes to send. Will it be sending the utf-8 representation of the string, the ANSI representation, or something else (perhaps utf-16, utf-32, or some arcane charset that's seldom used). The way to control this is to set the Socket.StringCharset property. This is likely the problem -- your program passed the string to Chilkat correctly, and now Chilkat must convert it to the actual bytes that are going to be sent over the socket. The StringCharset controls which byte representation. If the StringCharset is set to some charset (encoding) where the "Ł" character has no possible byte representation, then it is lost. For example, you cannot send "Ł" if StringCharset = "iso-8859-1" because that charset is 1 byte per char and there is no byte value that represents "Ł". To send "Ł", StringCharset must be something that includes "Ł", which can be any Unicode encoding (utf-8, utf-16, etc.) or the multibyte encodings for the region (iso-8859-2, Windows-1250, etc.)
What a great answer, thank you very much.
Before I've asked a question I was testing various combinations of put_StringCharset and put_Utf8 for both client and server socket. After reading your response I've decided to note every test and it's result just to be sure I didn't ommited anything.
I might suprise you, but the case when only "Ł" was ommited occured with put_Utf8(false).
In my case the solution is: put_Utf8(true) and put_StringCharset('utf-8') on both client and server.
answered Sep 20 at 02:59
Thanks! There's one more thing to know, and it may explain what happened in your case. If there is a literal string in your source file (i.e. a literal quoted string such as "ąćęłńśóźżĄĆĘŁŃŚÓŹŻ"), then it makes a difference how that source file is saved. For example, if you are using an IDE or text editor that saves source files in utf-8, then the bytes of those chars are saved in the utf-8 representation. When PHP is interpreting the source file, the string is composed of the bytes found in the source file -- thus the need to set the Chilkat Utf8 property to true/false depends on how the PHP source file was saved. (This applies to the source files for any programming language where strings are simply byte arrays. In other programming languages, it may be that the compiler/interpreter expects the source to be utf-8, and it would always be a mistake to save the source in the ANSI encoding.)