login about faq

There is a fundamental difference between binary bytes that do not represent text, and bytes that represent text according to some character encoding (such as utf-8). Never try to directly treat binary bytes as text. Never try to assign a string variable to a byte array. This is a guaranteed way to lose bytes. The entire purpose of encodings such as hex, base64, quoted-printable, URL, etc. is to provide a means for non-text binary bytes to be stored in a string.

For example, consider this C++ code:

    unsigned char x[256];
    int i;
    for (i=1; i<256; i++) x[i-1] = i;
    x[255] = '0';

printf("%s\n",x);

Just printing the byte values as us-ascii produces this garbage:



 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

It is IMPOSSIBLE to convert back from this garbage to the original binary bytes. Data was lost. How does one represent BACKSPACE, BELL, or any of the other non-printable control character? It is not possible. They are coalesced into SPACE chars or other identical junk chars.

See this to get a more firm understanding of the byte representation of characters: https://www.example-code.com/charset101.asp

Many programmers, especially in programming languages such as VB6, VBScript, FoxPro, ASP, etc. make this fundamental mistake.

asked Jul 01 at 09:27

chilkat's gravatar image

chilkat ♦♦
11.8k316358421

edited Jul 01 at 09:29

Be the first one to answer this question!
toggle preview

Follow this question

By Email:

Once you sign in you will be able to subscribe for any updates here

By RSS:

Answers

Answers and Comments

Markdown Basics

  • *italic* or __italic__
  • **bold** or __bold__
  • link:[text](http://url.com/ "title")
  • image?![alt text](/path/img.jpg "title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×10
×4
×3
×1

Asked: Jul 01 at 09:27

Seen: 256 times

Last updated: Jul 01 at 09:29

powered by OSQA