Discuss / Python / ensure_ascii为True(默认值)

ensure_ascii为True(默认值)

Topic source

ywjco_567

#1 Created at ... [Delete] [Delete and Lock User]

官方文档:

If ensure_ascii is True (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is False, these characters will be output as-is.

如果ensure_ascii为True(默认值),则输出保证将所有输入的非ASCII字符转义。如果确保ensure_ascii为False,这些字符将原样输出。

测试结果:

Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> import json
>>> obj = dict(name='小明', age=20)
>>> s = json.dumps(obj, ensure_ascii=True)
>>> print(s)
{"name": "\u5c0f\u660e", "age": 20}
>>> s = json.dumps(obj, ensure_ascii=False)
>>> print(s)
{"name": "小明", "age": 20}
>>> 

ywjco_567

#2 Created at ... [Delete] [Delete and Lock User]

python有个函数ascii():  -------  [https://docs.python.org/zh-cn/3/library/functions.html#ascii]

ascii(object)
    就像函数 repr(),返回一个对象可打印的字符串,但是 repr() 返回的字符串中非 ASCII 编码的字符,会使用 \x、\u 和 \U 来转义。生成的字符串和 Python 2 的 repr() 返回的结果相似。

很多人被它的显示"\u"迷惑了。是因为“返回的字符串中非 ASCII 编码的字符,会使用 \x\u\U 来转义”

>>> name='小明'
>>> print(name)
小明
>>> print(ascii(name))
'\u5c0f\u660e'
>>> 

汉字是“非ASCII字母”,在2007年提案“PEP 3131 -- Supporting Non-ASCII Identifiers”就是要求Python支持“非ASCII字母”。

Abstract

This PEP suggests to support non-ASCII letters (such as accented characters, Cyrillic, Greek, Kanji, etc.) in Python identifiers.

此PEP建议在Python标识符中支持非ASCII字母(如重音字符、西里尔文、希腊语、汉字等)。

请参阅:

https://www.python.org/dev/peps/pep-3131

ywjco_567

#3 Created at ... [Delete] [Delete and Lock User]

另外,'\u'表示UNICODE编码,其数据就是对应的UTF-8下的汉字。

要将它显示出来,Python3.x 可用print:

Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> s = '\u5c0f\u660e'
>>> print(s)
小明
>>> 

ywjco_567

#4 Created at ... [Delete] [Delete and Lock User]

最后,要看中文的UNICODE编码,可用:

>>> s1 = '小明'
>>> print(s1.encode('unicode_escape'))
b'\\u5c0f\\u660e'
>>> 

  • 1

Reply