数据挖掘之Requests小技巧

数据挖掘之Requests小技巧

  1. requests.util.dict_from_cookiejar 把cookie对象转化为字典

  2. requests.utils.unquote(url编码后的网址) 把经过编码的url网址进行解码

  3. 请求 SSL 证书验证

    response = requests.get('https://www.12306.cn/mormhweb/', verify=False)

  4. 设置超时

    response = requests.get(url, timeout=10)

  5. 配合状态码判断是否请求成功

    assert response.status_code == 200

下面为以上几点的用法小例子:

import requests

response = requests.get('http://www.baidu.com')
print(response.cookies)  # 输出cookie对象

print(requests.utils.dict_from_cookiejar(response.cookies)) # cookie对象转为字典

print(requests.utils.cookiejar_from_dict({'xxx':'xxx'})) # 字典转为cookie对象
import requests

print(requests.utils.unquote('https%3A%2F%2Fwww.baidu.com%2Fs%3Fwd%3D%E6%A3%AE%E4%B8%83'))  # https://www.baidu.com/s?wd=森七

print(requests.utils.quote('https://www.baidu.com/s?wd=森七'))  # https%3A%2F%2Fwww.baidu.com%2Fs%3Fwd%3D%E6%A3%AE%E4%B8%83
import requests

print(requests.get('https://www.12306.cn/mormhweb/'))  # 12306无证书时会报错
print(requests.get('https://www.12306.cn/mormhweb/', verify=False))  # <Response [200]>

程序猿都是很懒得!!下面我们举第四点的例子的时候写一个文件方便以后我们获取请求,需要导入retrying包,retrying包可以实现多次运行报错的代码,直到运行次数到规定的次数再抛出异常:

import requests
from retrying import retry

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'}

@retry(stop_max_attempt_number=3)
def _parse_url(url, method, data, proxies):
    if method == 'POST':
        response = requests.post(url, data=data, headers=headers, proxies)
    else:
        response = requests.get(url, headers=headers, timeout=3, proxies)
    assert response.status_code == 200
    return response.content.decode()

def parse_url(url, method='GET', data=None, proxies={})
    try:
        html_str = _parse_url(url, method, data, proxies)
    except:
        html_str = None
    return html_str

if __name__ == '__main__':
    url = 'www.baidu.com'
    print(parse_url(url))

第五点直接加载程序内就好,这里不做举例


  目录