I am attempting to send a POST to a .net 4.5 web api controller, but keep having problems with non-english characters. On the controller I user an async Task to receive the POST in MultipartFormDataStreamProvider. The POST may also include one or more files, but they do not affect the result.
I am using Fiddler to debug the POST and the difference between english and non-english characters in the POST is that non-english characters does not add final CrLf to the POST. This seems to be required for MultipartFormDataStreamProvider to see the end of the POST.
Here is a crude example:
Web api controller code:
I am going by this the wrong way?
I did some more testing with Chilkat.Upload and using that method there does not seem to be any issue:
I forgot to add that I am using the latest version 9.3.2 of Chilkat .NET for the 4.5 Framework in Visual Studio 2012 (released August 29). It is a 32-bit project.
Have the Chilkat.Http object create a session log file by setting the Chilkat.Http.SessionLogFilename property to the path of a log file you want created. Run your code to create the log file, and then examine the exact HTTP requests and responses in a text editor. Also, regarding special chars, you can view the log in a hex editor to see if the chars are ANSI or utf-8. (I use an editor called EmEditor that allows me to "reload" a text file in a specific encoding so I can then tell the exact encoding being used.)
answered Aug 29 '12 at 16:31
I ran your sample code, with slight modifications. Here's what I tested:
Chilkat.HttpRequest req = new Chilkat.HttpRequest(); Chilkat.Http http = new Chilkat.Http(); http.UnlockComponent("test"); req.ContentType = "multipart/form-data"; req.HttpVerb = "POST"; req.SetFromUrl("http://www.chilkatsoft.com/"); req.AddParam("caption", "Søren Sørensen"); string domain = "www.chilkatsoft.com"; int port = 80; bool ssl = false; Chilkat.HttpResponse resp = null; http.SessionLogFilename = "c:/aaworkarea/sessionLog.txt"; resp = http.SynchronousRequest(domain, port, ssl, req);
The response for this test doesn't matter. The only thing that matters is examining the exact request that is sent. Loading the sessionLog.txt in a text editor (EmEditor), I see this:
---- Sending ---- POST / HTTP/1.1 Host: www.chilkatsoft.com Content-Type: multipart/form-data; boundary=------------050808070303020900080604 Content-Length: 148 --------------050808070303020900080604 Content-Disposition: form-data; name="caption" Søren Sørensen --------------050808070303020900080604-- ---- Received ----
The exact HTTP request are the lines between (and not including) "---- Sending ----" and "---- Received ----". You can see the non-English chars in "Søren Sørensen". The big question is: what character encoding is used? Is it 1-byte per char (ANSI, i.e. iso-8859-1) or is it 2-bytes per char (utf-8). You can't tell just by looking at the text in a text editor, which is why I suggested looking at it in a hex editor. However, given that I'm using EmEditor, it detects the most probable code page (i.e. charset, i.e. character encoding) and shows that it's utf-8. If I "re-load" the file forcing EmEditor to interpret the bytes according to a different code page (iso-8859-1) I see this:
So obviously the HTTP request was sent using utf-8. When interpreting each individual byte as a character, we see the "ø" char's misinterpreted as two chars.
Given that HTTP requests and responses are MIME, they conform to all the standard rules for MIME, such as default charsets, content-transfer-encodings, continuation lines, etc. (Advice: MIME is a foundation of both email and HTTP, and understanding the basics of MIME and encoding is a HUGE benefit when working with anything related to these technologies.)
In this case, if the "charset" attribute of the "Content-Type" header is not specified, then the default charset is assumed to be us-ascii (but realistically MIME parsers will assume ANSI, i.e. iso-8859-1 for Western European languages). In this case, we have a "bug" in the MIME produced by Chilkat. In the sub-part of the MIME where "Søren Sørensen" is found, there is only one MIME header field -- the Content-Disposition header. There is no Content-Type header, and therefore no charset attribute is specified, and therefore the bytes in that MIME sub-part's body are expected to be iso-8859-1, but they are not. The solution is to add a Content-Type header specifying the charset, such as:
--------------050808070303020900080604 Content-Disposition: form-data; name="caption" Content-Type: text/plain; charset="utf-8" Søren Sørensen --------------050808070303020900080604--
This would be good if HTTP implementations weren't often rigid incomplete implementations of MIME. Experience has shown that the HTTP server-side counterpart is often a custom implementation that is easily broken if receiving anything other than the small subset of the MIME standard that it expects, or just as bad, information such as the charset is not even noticed. Your server-side implementation is Microsoft, so adding the Content-Type header should fix it. However, the better fix is to tell the Chilkat.Http object to use "iso-8859-1" for the content by setting the HttpRequest.Charset property = "iso-8859-1". The result is that you have the identical HTTP request, except the chars in the body, i.e. "Søren Sørensen" are encoded using iso-8859-1 (1 byte per char), and this matches the default MIME charset, and there are no extra MIME headers to trip up a shallow implementation of server-side HTTP.
answered Aug 30 '12 at 09:14
PS> The next version of Chilkat will automatically use ANSI instead of utf-8, and I'll also look into adding a way to specify the Content-Type header (if needed).
answered Aug 30 '12 at 09:17