文章详情页

网页爬虫 - python爬虫翻页问题，请问各位大神我这段代码怎样翻页，还有价格要登陆后才能看到，应该怎么解决

浏览：379日期：2022-08-06 14:43:40

问题描述

import urllib.requestimport reweb=urllib.request.urlopen(’https://www.gpyh.com/pricebuy/index?pageNum=1&hasStock=&goodsStandardId=1931&materialDictCode=&materialGroupCode=037001&diameter=&length=&brandId=&merchantId=’)neirong=web.read()def getPage(self,pageIndex): url = self.siteURL + '?pageNum=' + str(pageIndex) request = urllib2.Request(url) response = urllib2.urlopen(request) return response.read().decode(’gbk’)jiangrenhua=neirong.decode(’UTF-8’)RegularExpression=’<td>(.*)</td>’Valuable=re.findall(RegularExpression,jiangrenhua)information=[]for i in range(173): print(Valuable[i]

问题解答

回答1：

?pageNum=' + str(pageIndex)

这一个不就是你的页码控制吗？登录后才看到那就用cookie或者用户名密码模拟登录后获取

回答2：

httplib2基本应该是所有http请求的终结者了吧。

import httplib2import urllibhttp = httplib2.Http()url=’要获取的地址’header={’Accept’:’text/html’, ’Accept-Encoding’:’gzip, deflate, sdch’, ’Accept-Language’:’zh-CN,zh;q=0.8’, ’Cache-Control’:’max-age=0’, ’Connection’:’keep-alive’, ’Cookie’:’cookie内容’, ’Upgrade-Insecure-Requests’:’1’, ’User-Agent’:’Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36’} #要有登陆状态才能翻页就要模拟登陆后把cookie放进去body_value={’username’:’test’,’password’:’123456’} #表单的所有内容body_value=urllib.urlencode(body_value) #utf8编码response, content = http.request(url, ’GET’, headers=header,body=body_value) #GET或者POST方法response.encoding = ’utf-8’#content就是返回内容

Python 编程

上一条：python - angular route 与 django urls 冲突怎么解决？下一条：python - pyspider爬pdf爬了一小段时间后就不动了

排行榜

					
					docker - 如何修改运行中容器的配置
为什么我ping不通我的docker容器呢？？？
docker镜像push报错
golang - 用IDE看docker源码时的小问题
angular.js - angular内容过长展开收起效果
关于phpstudy设置主从数据库
docker-compose 为何找不到配置文件？
javascript - 正则匹配字符串特定语句后的数字
css3 - IE浏览器下，一个元素设置overflow:auto后，出现下拉滚动条，拖动滚动条图片会移动，但文字不移动
javascript - 关于数组的循环遍历问题
在cmd下进入mysql数据库，可以输入中文，但是查看表信息，不显示中文，是怎么回事，怎新手，请老师
				

热门标签