文章详情页

python - 爬虫获取网站数据，出现乱码怎么解决。

浏览：186日期：2022-08-04 09:36:09

问题描述

#!/usr/bin/python# -*- coding: utf-8 -*-import urllib2import reimport HTMLParserclass WALLSTREET: def __init__(self, baseUrl):self.url = baseUrl def get_html_content(self):url = self.urlresponse = urllib2.urlopen(url)str = response.read()print strbaseUrl='https://wallstreetcn.com/live/global' #华尔街见文urlws = WALLSTREET(baseUrl)ws.get_html_content()

以上是代码，写的很简单，但是print出来的是乱码尝试了 print str.decode(“utf-8“”)但是报错UnicodeDecodeError: ’utf8’ codec can’t decode byte 0x8b in position 1: invalid start byte

问题解答

回答1：

str = response.read()这句有两个问题：1、str是内置关键字必须更改为其他变量名2、查看网页源代码的编码方式，如果为utf-8在read()后加.decode(’utf-8’)，若为其他可以相应解码

小建议这种小程序写个函数会比用类来更加方便，无论是使用还是实现

回答2：

推测用的是sublime text？参考这个

回答3：

这儿应该是encode不是decode，而且你的变量名居然是跟内置关键字名字一样

回答4：

应该是encode吧

Python 编程

上一条：python - 如何将大量excel表格模板导入mysql数据库中？下一条：python - ImportError: cannot import name ScopedSession

排行榜

					
					dockerfile - 我用docker build的时候出现下边问题  麻烦帮我看一下
vue.js - vue 打包后 nginx 服务端API请求跨域问题无法解决。
angular.js - angular post的Content-Type被设置，导致不能上传图片，求助！！
nignx - docker内nginx 80端口被占用
关docker hub上有些镜像的tag被标记““This image has vulnerabilities””
angular.js - Web应用，单页面应用Cache问题
docker images显示的镜像过多，狗眼被亮瞎了，怎么办？
Selenium Web驱动程序和Java。元素在(x，y)点处不可单击。其他元素将获得点击?
css3 - 这个效果用 CSS 可以实现吗？border-image
docker-compose 为何找不到配置文件？
输入地址报以下截图错误，怎么办？
				

热门标签