您的位置:首页技术文章
文章详情页

Python selenium模拟网页点击爬虫交管12123违章数据

【字号: 日期:2022-06-14 17:00:56浏览:33作者:猪猪

在上一篇文章《Python教程—模拟网页点击爬虫定位系统》讲解怎么通过模拟点击方式爬取车辆定位数据,本次介绍怎么以模拟点击方式进入交管12123爬取车辆违章数据,本文直接讲解过程,使用的命令解释见上一篇文章。本文同《Python教程—模拟网页点击爬虫定位系统》同样为企业中实际的爬虫案例,如果之后想进入车企行业可以做个了解。

准备工具:spyder、selenium库、google浏览器及对应版本的chromedriver.exe

效果

Python selenium模拟网页点击爬虫交管12123违章数据

注:分享此案例目的是为了帮助同行解放双手,更好管理企业资产,本文程序以删除网址、账号密码,该网址比较麻烦的一点是开始点击登录的时候网页可能会有其他弹窗出现,使得原有路径改变,程序会因为找不到对应路径而报错,重新执行程序即可。除了模拟点击登录,还可以直接通过Cookie直接登录网页,这种方式就可以绕过登录的繁琐步骤。

调用库

from selenium import webdriverimport timeimport csvimport datetimefrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.support.wait import WebDriverWaitimport mathimport xlrd

读取需要查询的车牌号

Python selenium模拟网页点击爬虫交管12123违章数据

data = xlrd.open_workbook(’cheliang.xlsx’)

创建浏览,打开网页

opt = webdriver.ChromeOptions() #创建浏览#opt.set_headless() #无窗口模式driver = webdriver.Chrome(options=opt) #创建浏览器对象driver.maximize_window() #最大化窗口​print('正在打开网页')driver.get(’’) #打开网页

依次点击单位登录、输入账号、密码、点击验证码填写区域触发图片、勾选、输入验证码、点击登录

Python selenium模拟网页点击爬虫交管12123违章数据

time.sleep(3) #加载等待print('点击单位登录')time.sleep(3) #加载等待driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div[2]/div[2]/button').click()#点击单位登录​time.sleep(3) #加载等待print('正在填写账号')elem = driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[1]/div/input')# 清空原有内容elem.clear()# 填入账号elem.send_keys('')​time.sleep(1) #加载等待print('正在填写密码')elem = driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[2]/div/input')# 清空原有内容elem.clear()# 填入密码elem.send_keys('')​time.sleep(1) #加载等待print('正在查看验证码')driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[3]/div/input').click()#查看验证码print('请输入验证码')yanzhengma=input()​time.sleep(1) #加载等待driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[4]/div/label/input').click()#勾选​time.sleep(1) #加载等待# 填入验证码elem = driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[3]/div/input')elem.clear()elem.send_keys(str(yanzhengma))​time.sleep(1) #加载等待print('正在登陆')driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[5]/button').click()#点击

点击违法查询,设置查询时间

Python selenium模拟网页点击爬虫交管12123违章数据

driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[5]/button').click()#点击 time.sleep(3) #加载等待driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/ul/li[5]/a').click()#点击违法查询 time.sleep(1) #加载等待driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[1]/div/div[1]/span/i').click()#点击选择日期 for i in range(3): time.sleep(0.5) #加载等待 driver.find_element_by_xpath('/html/body/div[6]/div[4]/table/thead/tr/th[1]/i').click()#点击 time.sleep(0.5) #加载等待driver.find_element_by_xpath('/html/body/div[6]/div[4]/table/tbody/tr/td/span[1]').click()#点击 time.sleep(0.5) #加载等待driver.find_element_by_xpath('/html/body/div[6]/div[3]/table/tbody/tr[2]/td[1]').click()#点击

循环依次查询每个车牌违章信息,每次都需要清空上次输入,填写本次查询车牌,识别有多少条数据,共多少页,每页最多展示10条,最后一页有多少条数据

Python selenium模拟网页点击爬虫交管12123违章数据

for ii in range(0,nrows): rowValues= table.row_values(ii) #某一行数据 print(’正在读取第’+str(ii+1)+’辆车’)# 填写车牌 time.sleep(0.5) #加载等待 elem = driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[3]/div/input') elem.clear() elem.send_keys(rowValues)#输入车牌 time.sleep(0.1) #加载等待 driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[4]/button').click()#点击查询 time.sleep(0.5) #加载等待 result=driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div[2]/div[1]/div/p/span').text#总违章条数 result=int(result) a=math.ceil(result/10)#总页数 b=result%10 #除余

读取列表中的数据,其中扣分和罚款需要点击'查看详情',从弹窗中读取数据

result1=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[1]'))).textresult2=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[2]'))).textresult3=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[3]'))).textresult4=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[4]'))).textresult5=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[5]'))).textresult6=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[6]'))).textresult7=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[7]'))).textWebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[8]/a'))).click()#查看详情,打开弹窗time.sleep(1) #加载等待result8=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//form[@class=’form-horizontal’]/div[7]/span[2]'))).textresult9=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//form[@class=’form-horizontal’]/div[8]/span[2]'))).textresult=[result1,result2,result3,result4,result5,result6,result7,result8,result9]R.append(result)WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//div[@class=’modal-footer ui_modal’]/button'))).click()#关闭弹窗time.sleep(0.5) #加载等待

每读取一辆车的数据就写入表格中

with open(wenjian,’w’,encoding=’utf-8’,newline=’’) as fp: writer = csv.writer(fp) writer.writerows(R) #写入数据完整代码

from selenium import webdriverimport timeimport csvimport datetimefrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support import expected_conditions as ECfrom selenium.webdriver.support.wait import WebDriverWaitimport mathimport xlrddata = xlrd.open_workbook(’cheliang.xlsx’)table = data.sheets()[0]nrows = table.nrows #行数ncols = table.ncols #列数 opt = webdriver.ChromeOptions() #创建浏览#opt.set_headless() #无窗口模式driver = webdriver.Chrome(options=opt) #创建浏览器对象driver.maximize_window() #最大化窗口 print('正在打开网页')driver.get(’’) #打开网页 time.sleep(3) #加载等待print('点击单位登录')time.sleep(3) #加载等待driver.find_element_by_xpath('/html/body/div[1]/div[2]/div/div[2]/div[2]/button').click()#点击单位登录 time.sleep(3) #加载等待print('正在填写账号')elem = driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[1]/div/input')# 清空原有内容elem.clear()# 填入账号elem.send_keys('') time.sleep(1) #加载等待print('正在填写密码')elem = driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[2]/div/input')# 清空原有内容elem.clear()# 填入密码elem.send_keys('') time.sleep(1) #加载等待print('正在查看验证码')driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[3]/div/input').click()#查看验证码print('请输入验证码')yanzhengma=input() time.sleep(1) #加载等待driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[4]/div/label/input').click()#勾选 time.sleep(1) #加载等待# 填入验证码elem = driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[3]/div/input')elem.clear()elem.send_keys(str(yanzhengma)) time.sleep(1) #加载等待print('正在登陆')driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/div/div[2]/form[1]/div[5]/button').click()#点击 time.sleep(3) #加载等待driver.find_element_by_xpath('/html/body/div[4]/div/div[1]/ul/li[5]/a').click()#点击违法查询 time.sleep(1) #加载等待driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[1]/div/div[1]/span/i').click()#点击选择日期 for i in range(3): time.sleep(0.5) #加载等待 driver.find_element_by_xpath('/html/body/div[6]/div[4]/table/thead/tr/th[1]/i').click()#点击 time.sleep(0.5) #加载等待driver.find_element_by_xpath('/html/body/div[6]/div[4]/table/tbody/tr/td/span[1]').click()#点击 time.sleep(0.5) #加载等待driver.find_element_by_xpath('/html/body/div[6]/div[3]/table/tbody/tr[2]/td[1]').click()#点击 wenjian=datetime.datetime.now().strftime(’%Y-%m-%d-%H%M%S’) #以开始时间作为数据导出的表格文件名wenjian=wenjian+’.csv’ R=[]for ii in range(0,nrows): rowValues= table.row_values(ii) #某一行数据 print(’正在读取第’+str(ii+1)+’辆车’) # 填写车牌 time.sleep(0.5) #加载等待 elem = driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[3]/div/input') elem.clear() elem.send_keys(rowValues)#输入车牌 time.sleep(0.1) #加载等待 driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div[1]/div[2]/form/div[4]/button').click()#点击查询 time.sleep(0.5) #加载等待 result=driver.find_element_by_xpath('/html/body/div[3]/div/div[2]/div[2]/div[1]/div/p/span').text#总违章条数 result=int(result) a=math.ceil(result/10)#总页数 b=result%10 #除余for i in range(1,a):for j in range(1,11):result1=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[1]'))).text result2=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[2]'))).text result3=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[3]'))).text result4=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[4]'))).text result5=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[5]'))).text result6=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[6]'))).text result7=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[7]'))).text #result1=driver.find_element_by_xpath('//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[1]').text #result2=driver.find_element_by_xpath('//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[2]').text #result3=driver.find_element_by_xpath('//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[3]').text #result4=driver.find_element_by_xpath('//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[4]').text #result5=driver.find_element_by_xpath('//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[5]').text #result6=driver.find_element_by_xpath('//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[6]').text #result7=driver.find_element_by_xpath('//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[7]').text WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[8]/a'))).click()#查看详情,打开弹窗 time.sleep(1) #加载等待 #driver.find_element_by_xpath('//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[8]/a').click()#点击列表中的元素 #time.sleep(0.5) #加载等待 result8=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//form[@class=’form-horizontal’]/div[7]/span[2]'))).text result9=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//form[@class=’form-horizontal’]/div[8]/span[2]'))).text #result8=driver.find_element_by_xpath('//form[@class=’form-horizontal’]/div[7]/span[2]').text #result9=driver.find_element_by_xpath('//form[@class=’form-horizontal’]/div[8]/span[2]').text result=[result1,result2,result3,result4,result5,result6,result7,result8,result9] R.append(result) WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//div[@class=’modal-footer ui_modal’]/button'))).click()#关闭弹窗 time.sleep(0.5) #加载等待 #driver.find_element_by_xpath('//div[@class=’modal-footer ui_modal’]/button').click()#点击列表中的元素 #time.sleep(0.5) #加载等待 driver.find_element_by_link_text('下一页').click()#翻页time.sleep(0.5) #加载等待 if b>0:for j in range(1,b+1): result1=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[1]'))).text result2=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[2]'))).text result3=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[3]'))).text result4=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[4]'))).text result5=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[5]'))).text result6=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[6]'))).text result7=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[7]'))).text WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[8]/a'))).click()#查看详情,打开弹窗 time.sleep(1) #加载等待 result8=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//form[@class=’form-horizontal’]/div[7]/span[2]'))).text result9=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//form[@class=’form-horizontal’]/div[8]/span[2]'))).text result=[result1,result2,result3,result4,result5,result6,result7,result8,result9] R.append(result) WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//div[@class=’modal-footer ui_modal’]/button'))).click()#关闭弹窗 time.sleep(0.5) #加载等待 if b==0:for j in range(1,11): result1=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[1]'))).text result2=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[2]'))).text result3=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[3]'))).text result4=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[4]'))).text result5=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[5]'))).text result6=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[6]'))).text result7=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[7]'))).text WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//table[@id=’my-msg-list’]/tbody/tr['+str(j)+']/td[8]/a'))).click()#查看详情,打开弹窗 time.sleep(1) #加载等待 result8=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//form[@class=’form-horizontal’]/div[7]/span[2]'))).text result9=WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//form[@class=’form-horizontal’]/div[8]/span[2]'))).text result=[result1,result2,result3,result4,result5,result6,result7,result8,result9] R.append(result) WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//div[@class=’modal-footer ui_modal’]/button'))).click()#关闭弹窗 time.sleep(0.5) #加载等待 time.sleep(0.5) #加载等待 with open(wenjian,’w’,encoding=’utf-8’,newline=’’) as fp:writer = csv.writer(fp)writer.writerows(R) #写入数据

到此这篇关于Python selenium模拟网页点击爬虫交管12123违章数据的文章就介绍到这了,更多相关Python selenium模拟点击爬虫内容请搜索好吧啦网以前的文章或继续浏览下面的相关文章希望大家以后多多支持好吧啦网!

相关文章: