login about faq

I am using the Perl library. Been running into an issue and hopefully you can shed some light.

I've been going back and forth for months with certain Japanese users who are unable to read our email.

When I asked the customer to send me an example of an email (from a different provider) that was readable, I noticed that email had a content type of iso-2022-jp with an encoding of base64.

Using your library, if I send an email with Japanese html content, you are setting the content type to "shift_jis" however you are still encoding the email with quoted-printable.

According to your docs, I thought that if you select shift_jis, you would also select base64 encoding. I'm making an assumption that this is the reason some Japanese users are unable to view our mail.

Attached is an example email that our system is creating. Is this right? Shouldn't it be encoded in base64?

asked Feb 20 '13 at 20:20

chilkat's gravatar image

chilkat ♦♦
11.8k316358421


Character encoding and Content-Transfer-Encoding encoding are two entirely different things.

A character encoding specifies the exact bytes used to represent each character in a language. For example:

Consider this character: É

In the iso-8859-1 character encoding, it is represented by a single byte: 0xC9
In the utf-8 character encoding, it is represented by a two bytes: 0xC3 0x89
In the ucs-2 character encoding, it is represented by a two bytes: 0x00 0xC9

The Content-Transfer-Encoding encoding of a MIME message indicates how the bytes comprising the body are encoded (if encoded at all). For historical reasons, it was important for many mail processors that MIME messages be comprised of 7bit printable (us-ascii) chars with maximum line-lengths. Non-text data, such as attached images or non-English text containing 8bit character encodings, are encoded most commonly using either base64 or quoted-printable (see http://en.wikipedia.org/wiki/MIME)

The Content-Transfer-Encoding has nothing to do with the character encoding of the text. The character encoding determines the bytes used to represent each char. The Content-Transfer-Encoding determines how MIME body bytes are encoded. The MIME body bytes might happen to be text, image data, or anything else.

It's perfectly valid to choose either quoted-printable or base64 in any situation. Any MIME reader should decode according to the Content-Transfer-Encoding and the result will be the same. The only reason for choosing one over the other is probably due to size. Base64 is good for non-text data, or character data where the vast majority of characters are non-us-ascii. This would result in a 4/3rds increase in size, because every 3 bytes is represented as 4 printable chars. Quoted-printable is more efficient for text data where a large majority of chars are 7bit us-ascii, such as with European languages with accented chars, or perhaps HTML where the HTML tags and markup account for a large portion of the text.

link

answered Feb 20 '13 at 20:34

chilkat's gravatar image

chilkat ♦♦
11.8k316358421

Your answer
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×184
×17
×13
×2
×1

Asked: Feb 20 '13 at 20:20

Seen: 1,674 times

Last updated: Feb 20 '13 at 20:34

powered by OSQA