爬取目标



当点击“加载更多”的时候,page_start参数会自增20,因此定义Params的时候要用一个循环
用Postman预览Json字符串

这里我们只需要获得rate和name信息
用Python抓取
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
| import requests import json url = 'https://movie.douban.com/j/search_subjects' headers = { 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36' } for i in range(0,100,20): params = { 'type': 'tv', 'tag': '热门', 'sort': 'recommend', 'page_limit': 20, 'page_start': i } res = requests.get( url = url, params = params, headers = headers ) html = res.text data = json.loads(html) for data_temp in data['subjects']: print(data_temp['title'],data_temp['rate'])
|