|
This article briefly describes the history and character sets to configure Python program used in the analysis.
Background: In writing the script when the inevitable will design some variables related to the content and Chinese. This time for a Python novice (including me) is how to configure the python so that it can correctly identify the Chinese content within the program will become very troublesome. This article will briefly describes how to configure Python character sets and some related historical information.
Python's default character set
Python's default character set has been changed in several major versions, the following are the various versions of the default character set include:
Python2.1 and before: latin1
After Python2.3 and, Python2.5 ago: latin1 (but have non-ASCII character set character put forward WARNING)
Python2.5 and beyond: ASCII
In addition, in the PEP also proposed in the follow-up version of the default character set is UTF-8 adjustment
How to configure the default character set (Python2.5 ago)
Python script file to resolve the current configuration using the default character set at 2.5 before it is very difficult. Because these older versions do not support similar shebang of coding configuration. Although the old version before 2.5 is outdated, it was still mention of these versions of the character set configuration method. The configuration works by sys.setdefaultencoding () function. But the tangled, this function site.py (a script to run automatically at startup Python) is deleted. So the Internet there have been several versions:
reload (sys)
Modify sitecustomize.py configure global default character set
Both methods can just work, not elegant. More specifically, the mode of operation can be found in the discussion on stackoverflow
How to configure the default character set (Python2.5 and beyond)
Python2.5 after the default character set configuration on a lot simpler. Just behind Shebang (ie #! / Usr / bin / python line after), a line of character set configuration on the line to keep up. Writing regular character set configuration lines need to comply with such a regular coding [: =] \ s * ([- \ w.] +). That can take effect the following wording:
#! / Usr / bin / python
# Coding = utf8
or
#! / Usr / bin / python
# - * - Coding: utf8 - * -
Or more
#! / Usr / bin / python
# Vim: set fileencoding = < encoding name>:
These can work in. |
|
|
|