数据挖掘之Requests小技巧
requests.util.dict_from_cookiejar
把cookie对象转化为字典requests.utils.unquote(url编码后的网址)
把经过编码的url网址进行解码请求 SSL 证书验证
response = requests.get('https://www.12306.cn/mormhweb/', verify=False)
设置超时
response = requests.get(url, timeout=10)
配合状态码判断是否请求成功
assert response.status_code == 200
下面为以上几点的用法小例子:
import requests
response = requests.get('http://www.baidu.com')
print(response.cookies) # 输出cookie对象
print(requests.utils.dict_from_cookiejar(response.cookies)) # cookie对象转为字典
print(requests.utils.cookiejar_from_dict({'xxx':'xxx'})) # 字典转为cookie对象
import requests
print(requests.utils.unquote('https%3A%2F%2Fwww.baidu.com%2Fs%3Fwd%3D%E6%A3%AE%E4%B8%83')) # https://www.baidu.com/s?wd=森七
print(requests.utils.quote('https://www.baidu.com/s?wd=森七')) # https%3A%2F%2Fwww.baidu.com%2Fs%3Fwd%3D%E6%A3%AE%E4%B8%83
import requests
print(requests.get('https://www.12306.cn/mormhweb/')) # 12306无证书时会报错
print(requests.get('https://www.12306.cn/mormhweb/', verify=False)) # <Response [200]>
程序猿都是很懒得!!下面我们举第四点的例子的时候写一个文件方便以后我们获取请求,需要导入retrying包,retrying包可以实现多次运行报错的代码,直到运行次数到规定的次数再抛出异常:
import requests
from retrying import retry
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'}
@retry(stop_max_attempt_number=3)
def _parse_url(url, method, data, proxies):
if method == 'POST':
response = requests.post(url, data=data, headers=headers, proxies)
else:
response = requests.get(url, headers=headers, timeout=3, proxies)
assert response.status_code == 200
return response.content.decode()
def parse_url(url, method='GET', data=None, proxies={})
try:
html_str = _parse_url(url, method, data, proxies)
except:
html_str = None
return html_str
if __name__ == '__main__':
url = 'www.baidu.com'
print(parse_url(url))
第五点直接加载程序内就好,这里不做举例