login about faq

I am attempting to send a POST to a .net 4.5 web api controller, but keep having problems with non-english characters. On the controller I user an async Task to receive the POST in MultipartFormDataStreamProvider. The POST may also include one or more files, but they do not affect the result.

I am using Fiddler to debug the POST and the difference between english and non-english characters in the POST is that non-english characters does not add final CrLf to the POST. This seems to be required for MultipartFormDataStreamProvider to see the end of the POST.

Here is a crude example:

Chilkat.HttpRequest req = new Chilkat.HttpRequest();
Chilkat.Http http = new Chilkat.Http();
http.UnlockComponent("...");
req.ContentType = "multipart/form-data";
req.HttpVerb = "POST";
req.SetFromUrl("https://...");
req.AddParam("caption", "Søren Sørensen");
string domain = "localhost";
int port = 44300;
bool ssl = true;
Chilkat.HttpResponse resp = null;
resp = http.SynchronousRequest(domain, port, ssl, req);

Web api controller code:

public async Task<HttpResponseMessage> Post()
    {
        // Check if the request contains multipart/form-data.
        if (!Request.Content.IsMimeMultipartContent())
        {
            throw new HttpResponseException(HttpStatusCode.UnsupportedMediaType);
        }

        string root = HttpContext.Current.Server.MapPath("~/App_Data");
        var provider = new MultipartFormDataStreamProvider(root);

        try
        {
            // Read the form data.
            var bodyparts = await Request.Content.ReadAsMultipartAsync(provider);
.....

I am going by this the wrong way?

I did some more testing with Chilkat.Upload and using that method there does not seem to be any issue:

Chilkat.Upload upload = new Chilkat.Upload();
upload.Hostname = "localhost";
upload.Path = "/.../";
upload.Ssl = true;
upload.Port = 44300;
upload.AddFileReference("file1", "...");
upload.AddParam("caption", "Søren Sørensen");
upload.BlockingUpload();

I forgot to add that I am using the latest version 9.3.2 of Chilkat .NET for the 4.5 Framework in Visual Studio 2012 (released August 29). It is a 32-bit project.

Thanks Ronnie

asked Aug 29 '12 at 13:44

roan98dk's gravatar image

roan98dk
326192034

edited Aug 30 '12 at 08:05


Ronnie,

Have the Chilkat.Http object create a session log file by setting the Chilkat.Http.SessionLogFilename property to the path of a log file you want created. Run your code to create the log file, and then examine the exact HTTP requests and responses in a text editor. Also, regarding special chars, you can view the log in a hex editor to see if the chars are ANSI or utf-8. (I use an editor called EmEditor that allows me to "reload" a text file in a specific encoding so I can then tell the exact encoding being used.)

link

answered Aug 29 '12 at 16:31

chilkat's gravatar image

chilkat ♦♦
11.8k316358420

It does not seem to reveal much extra information. When I use a hex editor to compare the saves session log with the hexview display in Fiddler, the content is virtually identical. The only thing that really stand out when comparing a successful POST with a non-successful POST, is the CRLF (0D 0A) at the end (on the session log that is of course in the middle, since the response is appended to the request).

(Aug 29 '12 at 17:40) roan98dk

Ok, now it perhaps gets weird. I tried to force the request to send charset and specified first unicode. That didn't change anything, but when I set the charset to iso-8859-1 it worked.

Know the big question is why this is happening?

(Aug 29 '12 at 17:58) roan98dk

Perhaps the problem isn't even on the client side. I tried to look at some RFCs and it does not seem to be required to end a multipart message with a CRLF. I am not certain though. I found another article on http://forums.asp.net/t/1777847.aspx/1, where somebody else had a similar problem. That however was with a beta of MVC4 and I am using the release version. Maybe the problem is with ReadAsMultipartAsync expecting a terminating CRLF that it should not look for?

(Aug 30 '12 at 07:22) roan98dk

I ran your sample code, with slight modifications. Here's what I tested:

            Chilkat.HttpRequest req = new Chilkat.HttpRequest();
            Chilkat.Http http = new Chilkat.Http();
            http.UnlockComponent("test");
            req.ContentType = "multipart/form-data";
            req.HttpVerb = "POST";
            req.SetFromUrl("http://www.chilkatsoft.com/");
            req.AddParam("caption", "Søren Sørensen");
            string domain = "www.chilkatsoft.com";
            int port = 80;
            bool ssl = false;
            Chilkat.HttpResponse resp = null;
            http.SessionLogFilename = "c:/aaworkarea/sessionLog.txt";
            resp = http.SynchronousRequest(domain, port, ssl, req);

The response for this test doesn't matter. The only thing that matters is examining the exact request that is sent. Loading the sessionLog.txt in a text editor (EmEditor), I see this:

---- Sending ----
POST / HTTP/1.1
Host: www.chilkatsoft.com
Content-Type: multipart/form-data; boundary=------------050808070303020900080604
Content-Length: 148

--------------050808070303020900080604
Content-Disposition: form-data; name="caption"

Søren Sørensen
--------------050808070303020900080604--

---- Received ----

The exact HTTP request are the lines between (and not including) "---- Sending ----" and "---- Received ----". You can see the non-English chars in "Søren Sørensen". The big question is: what character encoding is used? Is it 1-byte per char (ANSI, i.e. iso-8859-1) or is it 2-bytes per char (utf-8). You can't tell just by looking at the text in a text editor, which is why I suggested looking at it in a hex editor. However, given that I'm using EmEditor, it detects the most probable code page (i.e. charset, i.e. character encoding) and shows that it's utf-8. If I "re-load" the file forcing EmEditor to interpret the bytes according to a different code page (iso-8859-1) I see this:

Søren Sørensen

So obviously the HTTP request was sent using utf-8. When interpreting each individual byte as a character, we see the "ø" char's misinterpreted as two chars.

Given that HTTP requests and responses are MIME, they conform to all the standard rules for MIME, such as default charsets, content-transfer-encodings, continuation lines, etc. (Advice: MIME is a foundation of both email and HTTP, and understanding the basics of MIME and encoding is a HUGE benefit when working with anything related to these technologies.)

In this case, if the "charset" attribute of the "Content-Type" header is not specified, then the default charset is assumed to be us-ascii (but realistically MIME parsers will assume ANSI, i.e. iso-8859-1 for Western European languages). In this case, we have a "bug" in the MIME produced by Chilkat. In the sub-part of the MIME where "Søren Sørensen" is found, there is only one MIME header field -- the Content-Disposition header. There is no Content-Type header, and therefore no charset attribute is specified, and therefore the bytes in that MIME sub-part's body are expected to be iso-8859-1, but they are not. The solution is to add a Content-Type header specifying the charset, such as:

--------------050808070303020900080604
Content-Disposition: form-data; name="caption"
Content-Type: text/plain; charset="utf-8"

Søren Sørensen
--------------050808070303020900080604--

This would be good if HTTP implementations weren't often rigid incomplete implementations of MIME. Experience has shown that the HTTP server-side counterpart is often a custom implementation that is easily broken if receiving anything other than the small subset of the MIME standard that it expects, or just as bad, information such as the charset is not even noticed. Your server-side implementation is Microsoft, so adding the Content-Type header should fix it. However, the better fix is to tell the Chilkat.Http object to use "iso-8859-1" for the content by setting the HttpRequest.Charset property = "iso-8859-1". The result is that you have the identical HTTP request, except the chars in the body, i.e. "Søren Sørensen" are encoded using iso-8859-1 (1 byte per char), and this matches the default MIME charset, and there are no extra MIME headers to trip up a shallow implementation of server-side HTTP.

link

answered Aug 30 '12 at 09:14

chilkat's gravatar image

chilkat ♦♦
11.8k316358420

Thanks for the detailed description. I know that iso-8859-1 solves the problem, but why does it not work when I specify utf-8 as charset?

Both client and server use utf-8 as the default everywhere else.

(Aug 30 '12 at 09:25) roan98dk

PS> The next version of Chilkat will automatically use ANSI instead of utf-8, and I'll also look into adding a way to specify the Content-Type header (if needed).

link

answered Aug 30 '12 at 09:17

chilkat's gravatar image

chilkat ♦♦
11.8k316358420

I tried to HttpRequest.Charset and HttpRequest.SendCharset, but they are not writting to the request header.

So there is currently no way to specify the character set for each part in the multipart?

(Aug 30 '12 at 09:36) roan98dk
Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×186
×27
×26
×7

Asked: Aug 29 '12 at 13:44

Seen: 3,548 times

Last updated: Aug 30 '12 at 09:36

powered by OSQA