活动地址:CSDN21天学习挑战赛
目录
response.text和response.content的区别
通过response.content进行decode,来解决中文乱码——
此模块主要用于发送请求获取响应,代码简洁。
pip install requests


案例——通过requests向360导航首页发送请求,获取该页面源码
- import requests
-
- # 目标网址
- url = "https://hao.360.com/?src=lm&ls=n49dd49049f"
-
- # 发送请求获取响应
- response = requests.get(url)
-
- # 查看响应对象的类型
- print(type(response))
- # 查看响应状态码
- print(response.status_code)
- # 查看响应内容的类型
- print(type(response.text))
- # 查看cookies
- print(response.cookies)
- # 查看响应的内容
- print(response.text)
输出——
- <class 'requests.models.Response'>
- 200
- <class 'str'>
for .360.com/>]> - class="" lang="zh-cn">
- "utf-8" />
360导航_一个主页,整个世界 - "dns-prefetch" href="//hao1.qhimg.com"/>
- "dns-prefetch" href="//hao2.qhimg.com"/>
- #很长,未全复制过来
·类型:str
·解码类型:resquests模块自动根据HTTP头部对响应的编码作出有根据的推测,推测文本编码
·类型:bytes
·解码类型:无指定,执行挑选
·response.content.decode():默认utf-8
·response.content.decode('GBK')
·utf-8
·gbk
·asci
·iso-8859-1
案例——
- import requests
-
- # 目标网址
- url = "https://www.taobao.com/"
-
- # 发送请求获取响应
- response = requests.get(url)
-
- # 手动设置编码格式
- response.encoding = 'utf8'
- # 打印源码的str类型数据
- print(response.text)
-
- # response.content是存储的bytes类型的响应数据,进行decode操作
- print(response.content.decode('utf-8'))
输出——
- "zh-CN">
- "utf-8" />
- "X-UA-Compatible" content="IE=edge,chrome=1" />
- "renderer" content="webkit" />
-
淘宝网 - 淘!我喜欢 - "spm-id" content="a21bo" />
- "description" content="淘宝网 - 亚洲较大的网上交易平台,提供各类服饰、美容、家居、数码、话费/点卡充值… 数亿优质商品,同时提供担保交易(先收货后付款)等安全交易保障服务,并由商家提供退货承诺、破损补寄等消费者保障服务,让你安心享受网上购物乐趣!" />
- "aplus-xplug" content="NONE">
- "keyword" content="淘宝,掏宝,网上购物,C2C,在线交易,交易市场,网上交易,交易市场,网上买,网上卖,购物网站,团购,网上贸易,安全购物,电子商务,放心买,供应,买卖信息,网店,一口价,拍卖,网上开店,网络购物,打折,免费开店,网购,频道,店铺" />
- "dns-prefetch" href="//g.alicdn.com" />
右键——检查——Network——User-Agent,复制即可

requests.get(ur1, headers=headers)
·headers参数接收字典形式的请求头
·请求头字段名作为key,字段对应的值作为value
-
- import requests
-
- # 目标网址
- url = "https://www.taobao.com/"
-
- # 构建请求头字典,最重要的就是User-Agent
- # 如果需要其他请求头,就在headers字典中加上
- headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
-
- # 发送请求获取响应
- response = requests.get(url,headers=headers)
-
- print(response.text)
输出——整个网页源码(截取显示)
-
-
-
-
如何删除网页地址中多余参数
·浏览器搜:猫,显示的url很复杂

·一个个删除参数并刷新,得到

- import requests
-
- # 目标网址
- url = "https://www.baidu.com/s?wd=python"
-
- headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
-
- # 发送请求获取响应
- response = requests.get(url,headers=headers)
-
- print(response.text)
- import requests
-
- # 目标网址
- url = "https://www.baidu.com/s?"
-
- headers = {
- 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}
-
- # 请求参数是一个字典
- kw = {'wd': 'python'}
-
- # 发送请求的时候设置参数字典,获取响应
- response = requests.get(url, headers=headers, params=kw)
-
- print(response.text)
网站经常利用请求头中的cookie字段来做用户访问状态的保持,那么我们可以在headers参数中添加cookie,模拟普通用户请求。Cookie具有时效性,一段时间后需要更换
如图找到对应Cookie并复制
- headers = {
- 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
- 'Cookie': 'BAIDUID=157D064FDE25DE5DD0E68AF62CBC3627:FG=1; BAIDUID_BFESS=157D064FDE25DE5DD0E68AF62CBC3627:FG=1; BIDUPSID=157D064FDE25DE5DD0E68AF62CBC3627; PSTM=1655611179; BD_UPN=12314753; ZFY=Cs:BflL5Del98YBOjx2EyRPzQE3QCyolFKzgVTguBEHI:C; BD_HOME=1; H_PS_PSSID=36548_36626_36673_36454_31254_36452_36690_36165_36693_36696_36569_36657_26350_36469; BA_HECTOR=85850gag05ak0l040h1hbg5st14; delPer=0; BD_CK_SAM=1; PSINO=7; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; H_PS_645EC=0e08fXgvc5rDJVK1jRjlqmZ7pLp5r%2Fmn9jlENTs3CQ4%2FbhzUL09Y%2F%2FYtCGA; baikeVisitId=e10d7983-547d-4f34-a8d8-ec98dbcba8e4; COOKIE_SESSION=115_0_2_2_1_2_1_0_2_1_0_0_0_0_0_0_1655611189_0_1656233437%7C3%230_0_1656233437%7C1'
- }
上网时遇到网络波动,一个请求等了很长时间也可能没有结果,这会让整个项目的效率变得很低。这时需要请求强制停止,若没有在特定时间内返回结果则报错。
response = requests.get(ur1, timeout=3)
3秒后无响应则抛异常
实例——
- import requests
-
- # 目标网址
- url = "https://www.baidu.com/"
-
- headers = {
- 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
- }
-
- try:
- response = requests.get(url, headers=headers, timeout=5) # 超时设置为5秒
- except:
- for i in range(4): # 循环去请求网站
- response = requests.get(url, headers=headers, timeout=20)
- if response.status_code == 200:
- break
- html_str = response.text
为了让服务器认为不是同一个客户端在请求;为了防止频繁向一个域名发送请求被封ip,需要使用代理ip

response = requests.get(url, proxies=proxies)
proxies的形式:字典
- proxies = {
- "http": "http://12.34.5679:9527",
- "https": "https://12.34.5679:9527",
- }
注意:当proxies字典中包含有多个键值对,发送请求时将按照url地址的协议来选择使用相应的代理ip。
requests模块发送post请求函数的其它参数与发送get请的参数完全一致
语法
response = requests.post(url, data) # data参数接收一个字典

- import requests
-
- url = "https://fanyi.so.com/"
-
- data = {
- 'word': '鸟'
- }
- response = requests.post(url)
- print(response.text)
- E:\anaconda\envs\request\python.exe C:/Users/Administrator/PycharmProjects/request/main.py
-
- "utf-8">
- "X-UA-Compatible" content="IE=edge,chrome=1">
-
- 360翻译
- "">
- "360翻译支持中英深度互译,提供生词释义、权威词典、双语例句等优质英语学习资源,360NMT(神经网络机器翻译)智能加持,更熟悉国人表达习惯!">
- "always" name="referrer">
- "stylesheet" type="text/css" href="https://s1.ssl.qhimg.com/static/d5b9d5285f9f7552/index.css" inline>
-
-
-
- id="index">
-
-
-
-
-
-
-
-
-

- import requests
-
- url = "https://fanyi.baidu.com/"
-
- data = {
- 'query': '爱'
- }
- response = requests.post(url)
- print(response.text)