python3.x - python 中的maketrans在utf-8文件中该怎么使用
问题描述
我写了一个处理文本的文件就是把文本中所有的符号都替换掉,替换成空格。用的python中maketrans和translate。其中在使用对于ASCII编码的文件时是正常的,但对于utf-8文件时,就报错,提示maketrans中的参数不等长,但是明明是一样长的啊:
File '/Users/lgq/Desktop/p3.py', line 10, in text_to_words
'abcdefghijklmnopqrstuvwxyz ')
ValueError: the first two maketrans arguments must have equal length
我查了一下说是maketrans在utf-8下不能用,那我在utf-8下该怎么替换掉字符呢,求各位大神指点。
def text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdsdef get_words_in_book(filename): ''' Read a book from filename, and return a list of its words.''' f = open(filename, 'r', encoding = 'utf-8') content = f.read() f.close() wds = text_to_words(content) return wdsbook_words = get_words_in_book('alice.txt')print('There are {0} words in the book, the first 100 aren{1}'.format(len(book_words), book_words[:100]))
问题解答
回答1:首先 这两个字符串长度不相等, ' 是一个字符, 也是一个字符你可以用 len() 查看。然后关于字符串什么的问题,最好说明 python 的版本
maketrans 参数长度不相等
my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ')
测试代码:
from string import translate, maketransdef text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdstext_to_words(’ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’测试’)
output
[’abcdefghijklmnopqrstuvwxyz’, ’xe6xb5x8bxe8xafx95’]
这是 python2 的运行结果
相关文章:
1. 计算机专业,未毕业,自己买了一套Java视频看,打算花两个月时间,到时出去找份实习的,算是自己自学吗?2. dockerfile - 为什么docker容器启动不了?3. docker start -a dockername 老是卡住,什么情况?4. Thinkphp 下载地址找不到了?5. angular.js - angularjs的自定义过滤器如何给文字加颜色?6. javascript - iview 打包之后 找不到自带的icon图片,而且路径重复,点解7. javascript - 微信小程序 wx.downloadFile下载文件大小有限制吗8. Python 中如何对单个字典中同一个 key 的值进行合并?9. java - Dubbo接口参数序列化问题10. html5 - vue-cli 装好了 新建项目的好了,找不到项目是怎么回事?