login about faq

Hello, i started to implement a socket which i do check for received data in a thread. Works as desired, but....i can not get the german vowl (umlauts) to work. I did try socket.put_Utf8(); but with no effective change.I receive crytical letters...

asked Jul 17 '13 at 07:35

RootTag's gravatar image

RootTag
35121623


See this: Socket Programming "Must Know" Concepts

(from the web page at the above URL)

5. Sending and Receiving Strings

Let’s say you want to send the string “ABC” to the connected peer. Most programmers implicitly assume it means sending three bytes: 0×41, 0×42, and 0×43. This is usually correct, but the assumption was made that the communicating programs have pre-agreed upon using the ANSI charset (such as Windows-1252) such that “A” is represented by a single byte having the value 0×41, and “B” is represented by 0×42, etc. What if the sender sends ANSI but the receiver is expecting Unicode (utf-16). The receiver would be expecting 6 bytes because “A” is represented by 2 bytes (0×41, 0×00), “B” is represented by 0×42, 0×00, and so on..

The StringCharset property controls how strings are sent and received. It defaults to “ANSI”. If, for example, it is set to “Unicode”, then a call to SendString(”ABC”) would result in the sending of 6 bytes: 0×41, 0×00, 0×42, 0×00, 0×43, 0×00. The ReceiveString method would know to interpret the incoming bytes as 2-byte/char Unicode chars and correctly return the string “ABC” to the caller.

This is even more crucial when sending non-English characters, where there are many more choices for character encodings. For example, Japanese strings might be sent in Shift_JIS, iso-2022-jp, utf-8, Unicode (ucs-2), euc-jp, etc. It is important that the communicating peers agree on the byte representation of the strings sent and received.

link

answered Jul 17 '13 at 08:48

chilkat's gravatar image

chilkat ♦♦
11.8k316358421

The problem you encountered w/ accented characters would occur if one side was sending the utf-8 byte representation of those characters (2-bytes per char), but the other side is interpreting the bytes as ANSI (1-byte per char). If the sender is sending utf-8, make sure to set the StringCharset property = "utf-8".

(Jul 17 '13 at 08:50) chilkat ♦♦

See my post below. Something is still wrong, while sending utf-8 :-(

(Jul 18 '13 at 10:00) RootTag

EDIT:

i tested my winsock.exe which is waiting for a Connection with Telnet and other Clients (e.g. http://www.codeproject.com/Articles/21007/MFC-Telnet-Application). Works good with chars like äöü. I did try to get this to work with chilkat-socket like this

socket.put_Utf8(true);
socket.put_StringCharset("utf-8");

CString val ="cd ölf";

success = socket.SendString(val);

I did send a chcp 1252 as the first command so it Shows correctly in my Editcontrol with Foldername "ölf".

But a "cd ölf" gives:

C:> cd lf

Das System kann den angegebenen Pfad nicht finden.

The System is unable to find the path 'lf' with should have been 'ölf'

link

answered Jul 17 '13 at 10:47

RootTag's gravatar image

RootTag
35121623

edited Jul 18 '13 at 04:55

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×65
×9
×5

Asked: Jul 17 '13 at 07:35

Seen: 1,739 times

Last updated: Jul 18 '13 at 10:00

powered by OSQA