The Instagram Hashtag scraper

Instascrape’s Available Scraped Data (3 Part Series)

1 The Instagram Profile scraper
2 The Instagram Post scraper
3 The Instagram Hashtag scraper

In this series, I have presented instascrape‘s Profile and Post scrapers and discussed what data points they collect. For this post, we’re going to look at what the Hashtag scraper is able to scrape.

chris-greening / instascrape

Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically

instascrape: powerful Instagram data scraping toolkit

Note: This module is no longer actively maintained.

DISCLAIMER:

Instagram has gotten increasingly strict with scraping and using this library can result in getting flagged for botting AND POSSIBLE DISABLING OF YOUR INSTAGRAM ACCOUNT. This is a research project and I am not responsible for how you use it. Independently, the library is designed to be responsible and respectful and it is up to you to decide what you do with it. I don’t claim any responsibility if your Instagram account is affected by how you use this library.

What is it?

instascrape is a lightweight Python package that provides an expressive and flexible API for scraping Instagram data. It is geared towards being a high-level building block on the data scientist’s toolchain and can be seamlessly integrated and extended with industry standard tools for web scraping, data science, and analysis.

Key features


View on GitHub

The Hashtag scraper scrapes 22 data points associated with an Instagram hashtag.

Instance attribute names have been chosen to be semantic and easy to understand.

The data points

The best way to learn is by example so we’ll take a look at the #google hashtag’s scraped Instagram data.

All instascrape scrapers have a to_dict method that returns all data as a dictionary so we can see everything in one shot.

<span>from</span> <span>instascrape</span> <span>import</span> <span>Hashtag</span>
<span>google_hashtag</span> <span>=</span> <span>Hashtag</span><span>(</span><span>'</span><span>google</span><span>'</span><span>)</span>
<span>google_hashtag</span><span>.</span><span>scrape</span><span>()</span>
<span>google_hashtag</span><span>.</span><span>to_dict</span><span>()</span>
<span>>>></span>
<span>{</span><span>'</span><span>csrf_token</span><span>'</span><span>:</span> <span>'</span><span>jfndsjklfhdasjklfhsdjklfasdhnfkjlsda</span><span>'</span><span>,</span>
<span>'</span><span>viewer</span><span>'</span><span>:</span> <span>None</span><span>,</span>
<span>'</span><span>viewer_id</span><span>'</span><span>:</span> <span>None</span><span>,</span>
<span>'</span><span>country_code</span><span>'</span><span>:</span> <span>'</span><span>US</span><span>'</span><span>,</span>
<span>'</span><span>language_code</span><span>'</span><span>:</span> <span>'</span><span>en</span><span>'</span><span>,</span>
<span>'</span><span>locale</span><span>'</span><span>:</span> <span>'</span><span>en_US</span><span>'</span><span>,</span>
<span>'</span><span>device_id</span><span>'</span><span>:</span> <span>'</span><span>12345678-1234-1234-1234-123456789012</span><span>'</span><span>,</span>
<span>'</span><span>browser_push_pub_key</span><span>'</span><span>:</span> <span>'</span><span>1245643253543556555564</span><span>'</span><span>,</span>
<span>'</span><span>key_id</span><span>'</span><span>:</span> <span>'</span><span>87</span><span>'</span><span>,</span>
<span>'</span><span>public_key</span><span>'</span><span>:</span> <span>'</span><span>alskdfnkl123213123ALSKDNfjklsdfasdfndsalfasdlfkh</span><span>'</span><span>,</span>
<span>'</span><span>version</span><span>'</span><span>:</span> <span>'</span><span>9</span><span>'</span><span>,</span>
<span>'</span><span>is_dev</span><span>'</span><span>:</span> <span>False</span><span>,</span>
<span>'</span><span>rollout_hash</span><span>'</span><span>:</span> <span>'</span><span>b10813bd9030</span><span>'</span><span>,</span>
<span>'</span><span>bundle_variant</span><span>'</span><span>:</span> <span>'</span><span>es6</span><span>'</span><span>,</span>
<span>'</span><span>frontend_dev</span><span>'</span><span>:</span> <span>'</span><span>c1f</span><span>'</span><span>,</span>
<span>'</span><span>id</span><span>'</span><span>:</span> <span>'</span><span>17843843635029645</span><span>'</span><span>,</span>
<span>'</span><span>name</span><span>'</span><span>:</span> <span>'</span><span>google</span><span>'</span><span>,</span>
<span>'</span><span>allow_following</span><span>'</span><span>:</span> <span>False</span><span>,</span>
<span>'</span><span>is_following</span><span>'</span><span>:</span> <span>False</span><span>,</span>
<span>'</span><span>is_top_media_only</span><span>'</span><span>:</span> <span>False</span><span>,</span>
<span>'</span><span>profile_pic_url</span><span>'</span><span>:</span> <span>'</span><span>https://scontent-lga3-1.cdninstagram.com/v/t51.2885-15/e35/c0.79.639.639a/s150x150/133888980_3517051138410873_6063716563788721688_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_cat=109&_nc_ohc=eteNTk5Tu3MAX98AX8f&tp=1&oh=c2e1906a7d31777531b1f5949c4ae81a&oe=60189A13</span><span>'</span><span>,</span>
<span>'</span><span>amount_of_posts</span><span>'</span><span>:</span> <span>9350019</span><span>}</span>
<span>from</span> <span>instascrape</span> <span>import</span> <span>Hashtag</span> 
<span>google_hashtag</span> <span>=</span> <span>Hashtag</span><span>(</span><span>'</span><span>google</span><span>'</span><span>)</span>
<span>google_hashtag</span><span>.</span><span>scrape</span><span>()</span>
<span>google_hashtag</span><span>.</span><span>to_dict</span><span>()</span>
<span>>>></span>
<span>{</span><span>'</span><span>csrf_token</span><span>'</span><span>:</span> <span>'</span><span>jfndsjklfhdasjklfhsdjklfasdhnfkjlsda</span><span>'</span><span>,</span>
 <span>'</span><span>viewer</span><span>'</span><span>:</span> <span>None</span><span>,</span>
 <span>'</span><span>viewer_id</span><span>'</span><span>:</span> <span>None</span><span>,</span>
 <span>'</span><span>country_code</span><span>'</span><span>:</span> <span>'</span><span>US</span><span>'</span><span>,</span>
 <span>'</span><span>language_code</span><span>'</span><span>:</span> <span>'</span><span>en</span><span>'</span><span>,</span>
 <span>'</span><span>locale</span><span>'</span><span>:</span> <span>'</span><span>en_US</span><span>'</span><span>,</span>
 <span>'</span><span>device_id</span><span>'</span><span>:</span> <span>'</span><span>12345678-1234-1234-1234-123456789012</span><span>'</span><span>,</span>
 <span>'</span><span>browser_push_pub_key</span><span>'</span><span>:</span> <span>'</span><span>1245643253543556555564</span><span>'</span><span>,</span>
 <span>'</span><span>key_id</span><span>'</span><span>:</span> <span>'</span><span>87</span><span>'</span><span>,</span>
 <span>'</span><span>public_key</span><span>'</span><span>:</span> <span>'</span><span>alskdfnkl123213123ALSKDNfjklsdfasdfndsalfasdlfkh</span><span>'</span><span>,</span>
 <span>'</span><span>version</span><span>'</span><span>:</span> <span>'</span><span>9</span><span>'</span><span>,</span>
 <span>'</span><span>is_dev</span><span>'</span><span>:</span> <span>False</span><span>,</span>
 <span>'</span><span>rollout_hash</span><span>'</span><span>:</span> <span>'</span><span>b10813bd9030</span><span>'</span><span>,</span>
 <span>'</span><span>bundle_variant</span><span>'</span><span>:</span> <span>'</span><span>es6</span><span>'</span><span>,</span>
 <span>'</span><span>frontend_dev</span><span>'</span><span>:</span> <span>'</span><span>c1f</span><span>'</span><span>,</span>
 <span>'</span><span>id</span><span>'</span><span>:</span> <span>'</span><span>17843843635029645</span><span>'</span><span>,</span>
 <span>'</span><span>name</span><span>'</span><span>:</span> <span>'</span><span>google</span><span>'</span><span>,</span>
 <span>'</span><span>allow_following</span><span>'</span><span>:</span> <span>False</span><span>,</span>
 <span>'</span><span>is_following</span><span>'</span><span>:</span> <span>False</span><span>,</span>
 <span>'</span><span>is_top_media_only</span><span>'</span><span>:</span> <span>False</span><span>,</span>
 <span>'</span><span>profile_pic_url</span><span>'</span><span>:</span> <span>'</span><span>https://scontent-lga3-1.cdninstagram.com/v/t51.2885-15/e35/c0.79.639.639a/s150x150/133888980_3517051138410873_6063716563788721688_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_cat=109&_nc_ohc=eteNTk5Tu3MAX98AX8f&tp=1&oh=c2e1906a7d31777531b1f5949c4ae81a&oe=60189A13</span><span>'</span><span>,</span>
 <span>'</span><span>amount_of_posts</span><span>'</span><span>:</span> <span>9350019</span><span>}</span>
from instascrape import Hashtag google_hashtag = Hashtag('google') google_hashtag.scrape() google_hashtag.to_dict() >>> {'csrf_token': 'jfndsjklfhdasjklfhsdjklfasdhnfkjlsda', 'viewer': None, 'viewer_id': None, 'country_code': 'US', 'language_code': 'en', 'locale': 'en_US', 'device_id': '12345678-1234-1234-1234-123456789012', 'browser_push_pub_key': '1245643253543556555564', 'key_id': '87', 'public_key': 'alskdfnkl123213123ALSKDNfjklsdfasdfndsalfasdlfkh', 'version': '9', 'is_dev': False, 'rollout_hash': 'b10813bd9030', 'bundle_variant': 'es6', 'frontend_dev': 'c1f', 'id': '17843843635029645', 'name': 'google', 'allow_following': False, 'is_following': False, 'is_top_media_only': False, 'profile_pic_url': 'https://scontent-lga3-1.cdninstagram.com/v/t51.2885-15/e35/c0.79.639.639a/s150x150/133888980_3517051138410873_6063716563788721688_n.jpg?_nc_ht=scontent-lga3-1.cdninstagram.com&_nc_cat=109&_nc_ohc=eteNTk5Tu3MAX98AX8f&tp=1&oh=c2e1906a7d31777531b1f5949c4ae81a&oe=60189A13', 'amount_of_posts': 9350019}

Enter fullscreen mode Exit fullscreen mode

If you’re interested in seeing instascrape in action, check out some of my other posts that explore practical examples:

图片[1]-The Instagram Hashtag scraper - 拾光赋-拾光赋

Scraping 10,000 data points from Donald Trump’s Instagram page with Python

Chris Greening ・ Dec 20 ’20

#python #datascience #showdev #contributorswanted
图片[1]-The Instagram Hashtag scraper - 拾光赋-拾光赋

Downloading recent Instagram photos using instascrape and Python

Chris Greening ・ Oct 26 ’20

#python #webscraping #showdev #contributorswanted

In the next part of the series, we will be exploring what attributes are provided by the Location scraper.

Instascrape’s Available Scraped Data (3 Part Series)

1 The Instagram Profile scraper
2 The Instagram Post scraper
3 The Instagram Hashtag scraper

原文链接:The Instagram Hashtag scraper

© 版权声明
THE END
喜欢就支持一下吧
点赞11 分享
Be happy for this moment, this moment is your life.
享受当下的快乐,因为这一刻正是你的人生
评论 抢沙发

请登录后发表评论

    暂无评论内容