Python获取HTTP请求的状态码(200,404等)
问题描述
Python获取HTTP请求的状态码(200,404等),不访问整个页面源码,那样太浪费资源:
输入:segmentfault.com 输出:200输入:segmentfault.com/nonexistant 输出:404
问题解答
回答1:参考文章:Python实用脚本清单
http不只有get方法(请求头部+正文),还有head方法,只请求头部。
import httplibdef get_status_code(host, path='/'): ''' This function retreives the status code of a website by requestingHEAD data from the host. This means that it only requests the headers.If the host cannot be reached or something else goes wrong, it returnsNone instead. ''' try:conn = httplib.HTTPConnection(host)conn.request('HEAD', path)return conn.getresponse().status except StandardError:return Noneprint get_status_code('segmentfault.com') # prints 200print get_status_code('segmentfault.com', '/nonexistant') # prints 404回答2:
你用get请求就会请求整个头部+正文, 可以试下head方法, 直接访问头部!
import requestshtml = requests.head(’http://segmentfault.com’) # 用head方法去请求资源头部print html.status_code # 状态码html = requests.head(’/nonexistant’) # 用head方法去请求资源头部print html.status_code # 状态码# 输出:200404