|
Python 3 most important new feature probably be the text and binary data made clearer distinction. Text always Unicode, represented by the str type binary data type represented by bytes. Python 3 will not in any way an implicit mix str and bytes, so the distinction is this particularly clear. You can not concatenate strings and byte packet can not bag the search string in bytes (or vice versa), nor can the string passed to the function parameter byte packet (or vice versa). This is a good thing.
String can be encoded into a byte packets, and byte packet can be decoded into a string.
>>> '20'.encode (' utf-8 ')
b '\ xe2 \ x82 \ xac20'
>>> B '\ xe2 \ x82 \ xac20'.decode (' utf-8 ')
'20'
To look at this question so: the string is an abstract representation of text. A string of characters, the character is an abstract entity that has nothing to with any particular binary. In operation the string, we live in happy ignorance. We can string segmentation and fragmentation, and the search string can be spliced. We do not care how they are represented internally, each character in the string to use to save a few bytes. Only in the string into byte code package (for example, to send them on the channel) or (reverse), we will begin to pay attention to this point from the byte packet decoding string.
Incoming encode and decode parameters are encoded (or codec). Encoding is a binary data representation of an abstract character mode. There are many encoding. UTF-8 given above is one of the following is another:
>>> '20'.encode (' iso-8859-15 ')
b '\ xa420'
>>> B '\ xa420'.decode (' iso-8859-15 ')
'20'
This encoding is a vital part of the conversion process. From the coding, bytes objects b '\ xa420' just a bunch of bits only. Coding giving it meaning. Different coding, meaning this pile of bits will be a big difference:
>>> B '\ xa420'.decode (' windows-1255 ')
'20' |
|
|
|