Archived Forum Post

Index of archived forum posts

Question:

Different sizes for the same CkEmail

Feb 08 '13 at 10:26

This is completely perplexing , perhaps you can shed some light on what is going on.

In my process I am downloading a list of message headers from IMAP using the following:

FetchSequenceHeaders…
CkEmail *email = 0;
email = bundle->GetEmail(i);

As the messages are downloaded I save off the message size using:

size = email->get_Size()

Later on in the process I want to pre-cache and download a number of those messages in bulk, so I connect to imap, open the mailbox, and use the following process:

  total_messages = l_imap.get_NumMessages();
  msgs = l_imap.GetAllUids();

  bundle = l_imap.FetchBundle(*msgs);
        for (int i = 1; i < bundle->get_MessageCount(); i++) {
              CkEmail *email = 0;
              email = bundle->GetEmail(i);
              uid = email->GetImapUid();
              emlbin.clear();
              remove_ckx_header_fields(email);
          email->GetMimeBinary(emlbin);

Now, to explain, I am removing the ckx-imap headers as it appeared to me they weren’t factored into the message size, as when I left them in the message was larger than it was “supposed” to be. In other words, the total size of emlbin was larger than the original get_Size() returned. So I setup that simple function to remove the headers, however now I’m having a new issue, after the headers are removed for some messages the total message size is now smaller than the original get_Size() value.

All of this works perfectly with the less than optimal “FetchSingleAsMime” process, however, trying to pull down messages in bulk ends up resulting in this issue.

What do you recommend?

Thanks! -hz


Answer

In summary, there is no real exact "size" of an email. I'll explain. The size of an email depends on a lot of things, such as what encodings might be used (base64, quoted-printable, 8bit), what charset might be used for text (utf-8, windows-1252, etc.), how header fields might be split into continuation lines, there's potential for Q/B encoding in header fields that have 8bit chars, etc. When an email is "rendered" to MIME, decisions need to be made for all of these things, and the size of the MIME could be different depending on what choices are made.

In addition, if a header-only is downloaded, the "size" of the email can only be obtained from the information provided by the server, whether it be POP3 or IMAP.

One more note: behind the scenes, Chilkat does a lot to give you good clean MIME. It automatically handles all sorts of unusual, out-dated, or badly formatted email. For example, you'll never have to deal with UU-encoded attachments because Chilkat automatically converts these to the typical, more widely accepted format where attachments are simply MIME parts encapsulated within a multipart/mixed using Base64 encoding. The same goes for other strange things, such as multipart/appledouble formats, or star-encoded header fields, or inconsistencies in CRLF line endings, or ambiguity in knowing the exact charset for a text body which must be surmised by using other evidence found within the email, or even guessed by characteristics of the data. This is the result of 10 years of evolution, and it's still evolving.

So the question is: What number is returned by the CkEmail's Size property? It is potentially an expensive computation, and given that the size is a non-exact number for the reasons described above, it returns the value that is most quickly obtained.

Here is the internal algorithm:

1) If the ckx-imap-totalsize header is present, then this is returned. This will be present in headers downloaded from an IMAP server. (This would be the size as reported by the IMAP server.)

2) If the CKZ-HeaderOnly header is present, which would happen if the header-only is downloaded from a POP3 server, then the CKZ-Size should also be present, and if so, it returns this information. (This would be the size as reported by the POP3 server.)

3) If neither header is present, the MIME tree is recursively descended and the size of each header is added, and if the MIME part is a leaf node, then the size of the body is added. For non-text bodies, this will be the binary size. For text bodies, this will be the size in bytes of the utf-8 representation. The size computed in this way will certainly be smaller than the rendered MIME -- especially if Base64 is used for binary attachments.