login about faq

After upgrading from 9.4.1.68 to 9.5.0.16, I am experiencing a strange issue with CkString's getEnc() method. If the CkString contains a UTF-8 encoded string and I use getEnc("utf-8") to get the string, any trailing double-quote sign will be stripped. I know that it should not be necessary to use getEnc() in this case, but I have my reasons and this is simply an example. If I use getUtf8() that should be the same as getEnc("utf-8"), the returned string keeps the trailing double-quote sign.

Here is an example:

CkString cks; cks.appendUtf8("\"123\"");
const char *str = cks.getEnc("utf-8");
const char *str2 = cks.getUtf8();

The content of str will be "123, but the content of str2 will be "123". They should be identical.

This worked fine in 9.4.1.68. I could simply rewrite my code, but it looks like there is an error in getEnc() with UTF-8. I should mention that if I do the exact same test using ANSI only, the error does not happen.

asked Mar 16 '14 at 17:24

roan98dk's gravatar image

roan98dk
326192034

Forgot to add that this is C++ code using chilkat-9.5.0-x86-vc12_xp.

(Mar 17 '14 at 03:00) roan98dk

Thanks. The problem was found and fixed. However, it is only a problem when "utf-8" is the argument to getEnc because this is the value that internally needs no conversion (because the string is stored internally as utf-8). The error was caused by the incorrect expectation that a buffer would contain the terminating null but it did not.

The workaround, until the next "SP1" version release, is to call getUtf8() instead of getEnc("utf-8"). Any other charset passed to getEnc won't cause the problem.

link

answered Mar 18 '14 at 09:30

chilkat's gravatar image

chilkat ♦♦
11.8k316358421

Thanks.. In fact I integrated exactly that workaround. I could simply use getUtf8() in general since I only use utf-8 anyway, but I prefer to write generic code whenever possible, in case I need it at some point.

(Mar 18 '14 at 09:56) roan98dk

Unfortunately it appears that CkHttpRequest's LoadBodyFromString(const char bodyStr, const char charset) method has the same issue while using .put_Utf8(true) and setting the output "charset" to "utf-8" as well, e.g.:

CkHttpRequest req;
req.put_Utf8(true);
req.LoadBodyFromString("{ utf-8 encoded json string }", "utf-8");
...
CkHttpResponse *resp = http.SynchronousRequest(domain, port, ssl, req);

Monitoring the http request with e.g. Fiddler2 shows that the trailing bracket in the example above, gets stripped away. Again this worked with 9.4.x so I guess that the issue is related or perhaps even caused by the CkString issue?

If I do not specify put_Utf8(true) it works fine, but since the input string is utf-8 encoded, so I need to specify it.

link

answered Mar 20 '14 at 18:41

roan98dk's gravatar image

roan98dk
326192034

Thanks! Chilkat will re-release the v9.5.0 C++ libs, adding a micro-version (or patch number) which will be something like "v9.5.0.21".

link

answered Mar 20 '14 at 20:21

chilkat's gravatar image

chilkat ♦♦
11.8k316358421

Thanks. By the way, is there a method in any Chilkat component for checking whether a string is valid utf-8?

In another post (no 4918) you mention round-tripping the utf-8 string through Unicode and back and then compare equality. Is that a working way to check for valid utf-8 encoding and why?

Finally checking for valid utf-8 is one desire, another is to know if a string contain utf-8 encoded characters?

(Mar 21 '14 at 04:59) roan98dk

I noticed the addition of CkObject::objcUtf8(const char *sUtf8). That method appears to internally encode characters that are not already utf-8 encoded? I could of course use that method and simply compare input and output string, but it would be nice with a simply boolean method, kind of like utf8::is_valid() from utfcpp.sourceforge.net. They have also added a method to find the character that is invalid (utf8::find_invalid()).

(Mar 22 '14 at 13:32) roan98dk

When will the new build be ready?

(Mar 26 '14 at 12:07) roan98dk

I see that newest download is 9.5.0.21, but you have not posted that is was released (You only wrote that you would re-release, but not when)?

Is 9.5.0.21 safe to use?

I noticed that code comments in some cases have character/encoding errors (e.g. CkHttp.h line 1776: Amazon"™s instead of Amazon's). I am quite frankly afraid to use the new version, since proper and trusted encoding is my main concern here.

link

answered Mar 31 '14 at 03:04

roan98dk's gravatar image

roan98dk
326192034

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×17
×11
×11
×1

Asked: Mar 16 '14 at 17:24

Seen: 1,290 times

Last updated: Mar 31 '14 at 03:04

powered by OSQA