Python Scrapy Installation And Example Crawler

Scrapy is an open source framework written in Python that allows you to extract data from structural content such as html and xml. It is able to scraping and crawling on websites especially fast enough. First of all, you should install python packet manager, pip.

Installing with using pip:

pip install scrapy

Starting a new project:

scrapy startproject tutorial

When a scrapy project is created, a file / directory structure will be created as follows.

tutorial/
    scrapy.cfg            # deploy configuration file

    tutorial/             # project's Python module, you'll import your code from here
        __init__.py

        items.py          # project items definition file

        pipelines.py      # project pipelines file

        settings.py       # project settings file

        spiders/          # a directory where you'll later put your spiders
            __init__.py

Example spider:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        'http://quotes.toscrape.com/page/1/',
        'http://quotes.toscrape.com/page/2/',
    ]

    def parse(self, response):
        for quote in response.css('div.quote'):
            yield {
                'text': quote.css('span.text::text').extract_first(),
                'author': quote.css('small.author::text').extract_first(),
                'tags': quote.css('div.tags a.tag::text').extract(),
            }

Scrapy spiders can scan and extract data on one or more addresses. With Scrapy selectors, the desired fields can be selected and filtered. Scrapy is supported by xpath scrapy in selectors.

For crawling:

scrapy crawl quotes

For writing output to json file:

scrapy crawl quotes -o qutoes.json

原文链接:Python Scrapy Installation And Example Crawler

© 版权声明
THE END
喜欢就支持一下吧
点赞10 分享
评论 抢沙发

请登录后发表评论

    暂无评论内容