Archived Forum Post

Index of archived forum posts

Question:

Binary Bytes to US-ASCII

Jul 01 '16 at 09:29

There is a fundamental difference between binary bytes that do not represent text, and bytes that represent text according to some character encoding (such as utf-8). Never try to directly treat binary bytes as text. Never try to assign a string variable to a byte array. This is a guaranteed way to lose bytes. The entire purpose of encodings such as hex, base64, quoted-printable, URL, etc. is to provide a means for non-text binary bytes to be stored in a string.

For example, consider this C++ code:

    unsigned char x[256];
    int i;
    for (i=1; i<256; i++) x[i-1] = i;
    x[255] = '0';

printf("%s\n",x);

Just printing the byte values as us-ascii produces this garbage:



 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

It is IMPOSSIBLE to convert back from this garbage to the original binary bytes. Data was lost. How does one represent BACKSPACE, BELL, or any of the other non-printable control character? It is not possible. They are coalesced into SPACE chars or other identical junk chars.

See this to get a more firm understanding of the byte representation of characters: https://www.example-code.com/charset101.asp

Many programmers, especially in programming languages such as VB6, VBScript, FoxPro, ASP, etc. make this fundamental mistake.