python分布式爬虫中的Redis是什么？怎么用？-中华考试网

导航

优质课程直播间官网直达

网络开发

首页> python> python爬虫> 文章内容

python分布式爬虫中的Redis是什么？怎么用？

来源 :中华考试网 2020-11-26

中

　　利用redis做分布式系统，最经典的就是scrapy-Redis，这是比较成熟的框架。同时我们也可以利用Redis的队列功能或者订阅发布功能来打造自己的分布式系统。

　　Redis作为通信载体的优点是读写迅速，对爬虫的速度影响可忽略不计，使用比较普遍。

　　主程序示例：

　　import scrapy

　　from scrapy.http import Request

　　from scrapy.selector import HtmlXPathSelector

　　from scrapy.dupefilter import RFPDupeFilter

　　from scrapy.core.scheduler import Scheduler

　　import redis

　　from ..items import XiaobaiItem

　　from scrapy_redis.spiders import RedisSpider

　　class RenjianSpider(RedisSpider):

　　name = 'baidu'

　　allowed_domains = ['baidu.com']

　　def parse(self, response):

　　news_list = response.xpath('//*[@id="content-list"]/div[@class="item"]')

　　for news in news_list:

　　content = response.xpath('.//div[@class="part1"]/a/text()').extract_first().strip()

　　url = response.xpath('.//div[@class="part1"]/a/@href').extract_first()

　　yield XiaobaiItem(url=url,content=content)

　　yield Request(url='http://dig..com/',callback=self.parse)

分享到

网络开发

python分布式爬虫中的Redis是什么？怎么用？

您可能感兴趣的文章

python分布式爬虫中的Redis是什么？怎么用？

RabbitMQ如何在python分布式爬虫中构建？

python爬虫如何配置requests日志输出？

python定时爬虫启用时如何减少内存？

Python爬虫中标签的使用方法是什么？

守护线程如何在python3爬虫中设置？

资讯

我的

网络开发

python分布式爬虫中的Redis是什么？怎么用？

python课程免费试听预约

您可能感兴趣的文章

python分布式爬虫中的Redis是什么？怎么用？

RabbitMQ如何在python分布式爬虫中构建？

python爬虫如何配置requests日志输出？

python定时爬虫启用时如何减少内存？

Python爬虫中标签的使用方法是什么？

守护线程如何在python3爬虫中设置？

资讯

我的