文章详情页

网页爬虫 - Python3.6 下的爬虫总是重复爬第一页的内容

浏览：241日期：2022-06-30 17:08:03

问题描述

问题如题：改成while，试了很多，然没有效果，请教大家

# coding:utf-8# from lxml import etreeimport requests,lxml.html,osclass MyError(Exception): def __init__(self, value):self.value = value def __str__(self):return repr(self.value) def get_lawyers_info(url): r = requests.get(url) html = lxml.html.fromstring(r.content) # phones = html.xpath(’//span[@class='law-tel']’) phones = html.xpath(’//span[@class='phone pull-right']’) # names = html.xpath(’//p[@class='fl']/p/a’) names = html.xpath(’//h4[@class='text-center']’) if(len(phones) == len(names)):list(zip(names,phones))phone_infos = [(names[i].text, phones[i].text_content()) for i in range(len(names))] else:error = 'Lawyers amount are not equal to the amount of phone_nums: '+urlraise MyError(error) phone_infos_list = [] for phone_info in phone_infos:if(phone_info[0] == ''): info = '没留姓名'+': '+phone_info[1]+'rn'else: info = phone_info[0]+': '+phone_info[1]+'rn'print (info)phone_infos_list.append(info) return phone_infos_listdir_path = os.path.abspath(os.path.dirname(__file__))print (dir_path)file_path = os.path.join(dir_path,'lawyers_info.txt')print (file_path)if os.path.exists(file_path): os.remove(file_path)with open('lawyers_info.txt','ab') as file: for i in range(1000):url = 'http://www.xxxx.com/cooperative_merchants?searchText=&industry=100&provinceId=19&cityId=0&areaId=0&page='+str(i+1)# r = requests.get(url)# html = lxml.html.fromstring(r.content)# phones = html.xpath(’//span[@class='phone pull-right']’)# names = html.xpath(’//h4[@class='text-center']’) # if phones or names:info = get_lawyers_info(url)for each in info: file.write(each.encode('gbk'))

问题解答

回答1：

# coding: utf-8import requestsfrom pyquery import PyQuery as Qurl = ’http://www.51myd.com/cooperative_merchants?industry=100&provinceId=19&cityId=0&areaId=0&page=’with open(’lawyers_info.txt’, ’ab’) as f: for i in range(1, 5):r = requests.get(’{}{}’.format(url, i))usernames = Q(r.text).find(’.username’).text().split()phones = Q(r.text).find(’.phone’).text().split()print zip(usernames, phones)

Python 编程

上一条：python from fileutils import FileUtils文件操作下一条：网页爬虫 - python+smtp发送邮件附件问题

相关文章：

1. 关于docker下的nginx压力测试2. debian - docker依赖的aufs-tools源码哪里可以找到啊？3. nignx - docker内nginx 80端口被占用4. docker start -a dockername 老是卡住，什么情况？5. dockerfile - 为什么docker容器启动不了？6. golang - 用IDE看docker源码时的小问题7. docker安装后出现Cannot connect to the Docker daemon.8. docker内创建jenkins访问另一个容器下的服务器问题9. macos - mac下docker如何设置代理10. docker镜像push报错

排行榜

					
					macos - mac下docker如何设置代理
docker安装后出现Cannot connect to the Docker daemon.
关于docker下的nginx压力测试
nignx - docker内nginx 80端口被占用
docker镜像push报错
debian - docker依赖的aufs-tools源码哪里可以找到啊？
golang - 用IDE看docker源码时的小问题
dockerfile - 为什么docker容器启动不了？
docker start -a dockername 老是卡住，什么情况？
docker内创建jenkins访问另一个容器下的服务器问题
html5 - node静态资源服务器设置了Cache-Control，但浏览器从来不走304
				

热门标签