Quoted-printable and Base64 – Convert 8-bit non-English characters to 7-bit ASCII characters.
Although the original intention of this is to meet the regulation that non-ASCII characters cannot be used directly in emails, it also has other important meanings:
a) All binary files can thus be converted into printable text encodings and edited with text software;
b) Ability to perform simple encryption of text.
2.
First, briefly introduce the Quoted-printable encoding conversion method. It is mainly used when a small amount of non-ASCII characters are mixed in ACSII text, and it is not suitable for converting pure binary files.
It stipulates that each 8-bit byte is converted into 3 characters.
The first character is the “=” sign, which is fixed.
The last two characters are two hexadecimal numbers, which represent the values of the first four digits and the last four digits of the byte respectively.
For example, the “form feed” in ASCII code is 12, the binary form is 00001100, and in hexadecimal, it is 0C, so its encoded value is “=0C”. The ASCII value of the “=” sign is 61, and the binary form is 00111101 because its encoded value is “=3D”. All characters except printable ASCII must be converted this way.
All printable ASCII characters (decimal values from 33 to 126) are left unchanged, except “=” (decimal value 61).
3.
Below, the encoding conversion method of Base64 is introduced in detail.
The so-called Base64 means to select 64 characters—-lowercase letters az, uppercase letters AZ, numbers 0-9, symbols “+”, “/” (plus the “=” as a pad word, which is actually 65 characters) —- as a base character set. Then, all other symbols are converted to characters in this character set.
Specifically, the conversion method can be divided into four steps.
The first step is to take every three bytes as a group, a total of 24 binary bits.
In the second step, the 24 binary bits are divided into four groups, each group has 6 binary bits.
The third step is to add two 00s in front of each group to expand into 32 binary bits, that is, four bytes.
The fourth step, according to the following table, get the corresponding symbol of each byte after expansion, which is the encoded value of Base64.
0 A 17 R 34 i 51 z
1 B 18 S 35 j 520
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10K 27b 44s 619
11L 28c 45t 62+
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w
15P 32g 49x
16 Q 33 h 50 y
Because Base64 converts three bytes into four bytes, the text encoded by Base64 will be about one-third larger than the original text.
4.
Give a specific example to demonstrate how the English word Man is converted into Base64 encoding.
Text content Man
ASCII 77 97 110
Bit pattern 0 1 0 0 1 1 0 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0
Index 19 22 5 46
Base64-Encoded T W F u
In the first step, the ASCII values of “M”, “a”, and “n” are 77, 97, and 110, respectively, and the corresponding binary values are 01001101, 01100001, and 01101110. Connect them into a 24-bit binary string 010011010110000101101110.
The second step is to divide the 24-bit binary string into 4 groups of 6 binary bits: 010011, 010110, 000101, 101110.
The third step is to add two 00s in front of each group and expand into 32 binary bits, that is, four bytes: 00010011, 00010110, 00000101, 00101110. Their decimal values are 19, 22, 5, 46, respectively.
The fourth step, according to the above table, get each value corresponding to Base64 encoding, namely T, W, F, u.
Therefore, the Base64 encoding of Man is TWFu.
5.
If the number of bytes is less than three, it is handled like this:
a) The case of two bytes: the total 16 binary bits of these two bytes are converted into three groups according to the above rules. In the last group, in addition to adding two 0s in front, two 0s must be added in the back. . In this way, a three-digit Base64 encoding is obtained, and a “=” sign is added at the end.
For example, the string “Ma” is two bytes, which can be converted into three groups of 00010011, 00010110, 00010000, and the corresponding Base64 values are T, W, E respectively, and then add a “=” sign, so “Ma” The Base64 encoding is TWE=.
b) The case of one byte: The 8 binary bits of this byte are converted into two groups according to the above rules, and the last group is added with two 0s in front and four 0s in the back. In this way, a two-digit Base64 code is obtained, and two “=” signs are added at the end.
For example, the letter “M” is a byte, which can be converted into two groups of 00010011 and 00010000. The corresponding Base64 values are T and Q, respectively, and two “=” signs are added, so the Base64 encoding of “M” is TQ ==.
6.
Take another Chinese example, how to convert the Chinese character “Yan” into Base64 encoding?
It should be noted here that Chinese characters can have multiple encodings, such as gb2312, utf-8, gbk, etc. The corresponding values of Base64 for each encoding are different. The following example uses utf-8 as an example.
First of all, the utf-8 encoding of “strict” is E4B8A5, which is three-byte “11100100 10111000 10100101″ when written in binary. Convert this 24-bit binary string into four sets of 32-bit binary values ”00111001 00001011 00100010 00100101” according to the rules in Section 3, and the corresponding decimal numbers are 57, 11, 34, and 37. The Base64 value is 5, L, i, l.
Therefore, the Base64 value of the Chinese character “yan” (utf-8 encoding) is 5Lil.
7.
In the PHP language, there is a pair of specialized functions for Base64 conversion: base64_encode() for encoding and base64_decode() for decoding.
The characteristic of this pair of functions is that they will be Base64 encoded according to the rules regardless of the encoding of the input text. Therefore, if you want to get the Base64 corresponding value in utf-8 encoding, you must ensure that the input text is utf-8 encoded.
8.
This section describes how to perform Base64 encoding in Javascript.
First of all, assuming that the encoding of the web page is utf-8, we hope that for the same string, we can get the same Base64 encoding in PHP and Javascript.
A problem arises here. Because the strings inside Javascript are all stored in the form of utf-16, when encoding, we must first convert the value of utf-8 to utf-16 and then encode it. Convert utf-16 values back to utf-8.
Javascript functions:
/*
* Interfaces:
* utf8 = utf16to8(utf16);
* utf16 = utf8to16(utf8);
*/
function utf16to8(str) {
var out, i, len, c;
out = "";
len = str.length;
for(i = 0; i < len; i++) {
c = str.charCodeAt(i);
if ((c >= 0x0001) && (c <= 0x007F)) {
out += str.charAt(i);
} else if (c > 0x07FF) {
out += String.fromCharCode(0xE0 | ((c >> 12) & 0x0F));
out += String.fromCharCode(0x80 | ((c >> 6) & 0x3F));
out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F));
} else {
out += String.fromCharCode(0xC0 | ((c >> 6) & 0x1F));
out += String.fromCharCode(0x80 | ((c >> 0) & 0x3F));
}
}
return out;
}
function utf8to16(str) {
var out, i, len, c;
var char2, char3;
out = "";
len = str.length;
i = 0;
while(i < len) {
c = str.charCodeAt(i++);
switch(c >> 4)
{
case 0: case 1: case 2: case 3: case 4: case 5: case 6: case 7:
// 0xxxxxxx
out += str.charAt(i-1);
break;
case 12: case 13:
// 110x xxxx 10xx xxxx
char2 = str.charCodeAt(i++);
out += String.fromCharCode(((c & 0x1F) << 6) | (char2 & 0x3F));
break;
case 14:
// 1110 xxxx 10xx xxxx 10xx xxxx
char2 = str.charCodeAt(i++);
char3 = str.charCodeAt(i++);
out += String.fromCharCode(((c & 0x0F) << 12) |
((char2 & 0x3F) << 6) |
((char3 & 0x3F) << 0));
break;
}
}
return out;
}
The above code defines two functions, utf16to8() is used to convert utf-16 to utf-8, and utf8to16 is used to convert utf-8 to utf-16.
Below is the actual function for base64 encoding.
/*
* Interfaces:
* b64 = base64encode(data);
* data = base64decode(b64);
*/
var base64EncodeChars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
var base64DecodeChars = new Array(
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, 63,
52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1,
-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1,
-1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,
41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, -1, -1, -1, -1, -1);
function base64encode(str) {
var out, i, len;
var c1, c2, c3;
len = str.length;
i = 0;
out = "";
while(i < len) {
c1 = str.charCodeAt(i++) & 0xff;
if(i == len)
{
out += base64EncodeChars.charAt(c1 >> 2);
out += base64EncodeChars.charAt((c1 & 0x3) << 4);
out += "==";
break;
}
c2 = str.charCodeAt(i++);
if(i == len)
{
out += base64EncodeChars.charAt(c1 >> 2);
out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4));
out += base64EncodeChars.charAt((c2 & 0xF) << 2);
out += "=";
break;
}
c3 = str.charCodeAt(i++);
out += base64EncodeChars.charAt(c1 >> 2);
out += base64EncodeChars.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4));
out += base64EncodeChars.charAt(((c2 & 0xF) << 2) | ((c3 & 0xC0) >>6));
out += base64EncodeChars.charAt(c3 & 0x3F);
}
return out;
}
function base64decode(str) {
var c1, c2, c3, c4;
var i, len, out;
len = str.length;
i = 0;
out = "";
while(i < len) {
/* c1 */
do {
c1 = base64DecodeChars[str.charCodeAt(i++) & 0xff];
} while(i < len && c1 == -1);
if(c1 == -1)
break;
/* c2 */
do {
c2 = base64DecodeChars[str.charCodeAt(i++) & 0xff];
} while(i < len && c2 == -1);
if(c2 == -1)
break;
out += String.fromCharCode((c1 << 2) | ((c2 & 0x30) >> 4));
/* c3 */
do {
c3 = str.charCodeAt(i++) & 0xff;
if(c3 == 61)
return out;
c3 = base64DecodeChars[c3];
} while(i < len && c3 == -1);
if(c3 == -1)
break;
out += String.fromCharCode(((c2 & 0XF) << 4) | ((c3 & 0x3C) >> 2));
/* c4 */
do {
c4 = str.charCodeAt(i++) & 0xff;
if(c4 == 61)
return out;
c4 = base64DecodeChars[c4];
} while(i < len && c4 == -1);
if(c4 == -1)
break;
out += String.fromCharCode(((c3 & 0x03) << 6) | c4);
}
return out;
}
base64encode() in the above code is used for encoding and base64decode() is used for decoding.
Therefore, to encode utf-8 characters, write:
sEncoded=base64encode(utf16to8(str));
Then, the decoding should be written like this:
sDecoded=utf8to16(base64decode(sEncoded));