Web Scraping All Google Play App Reviews in Python

Google Play Web Scraping (3 Part Series)

1 Scrape Google Play Store App in Python
2 Scrape Google Play Search Apps in Python
3 Web Scraping All Google Play App Reviews in Python

What will be scraped

Prerequisites

Basic knowledge scraping with CSS selectors

CSS selectors declare which part of the markup a style applies to thus allowing to extract data from matching tags and attributes.

If you haven’t scraped with CSS selectors, there’s a dedicated blog post of mine about how to use CSS selectors when web-scraping that covers what it is, pros and cons, and why they’re matter from a web-scraping perspective.

Separate virtual environment

In short, it’s a thing that creates an independent set of installed libraries including different Python versions that can coexist with each other at the same system thus prevention libraries or Python version conflicts.

If you didn’t work with a virtual environment before, have a look at the dedicated Python virtual environments tutorial using Virtualenv and Poetry blog post of mine to get familiar.

Note: this is not a strict requirement for this blog post.

Install libraries:

pip <span>install </span>playwright parsel
pip <span>install </span>playwright parsel
pip install playwright parsel

Enter fullscreen mode Exit fullscreen mode

You also need to install chromium for playwright to work and operate the browser:

playwright <span>install </span>chromium
playwright <span>install </span>chromium
playwright install chromium

Enter fullscreen mode Exit fullscreen mode

After that, if you’re on Linux, you might need to install additional things (playwright will prompt you in the terminal in case something is missing):

<span>sudo </span>apt-get <span>install</span> <span>-y</span> libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libatspi2.0-0 libwayland-client0
<span>sudo </span>apt-get <span>install</span> <span>-y</span> libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libatspi2.0-0 libwayland-client0
sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libatspi2.0-0 libwayland-client0

Enter fullscreen mode Exit fullscreen mode

Reduce the chance of being blocked

There’s a chance that a request might be blocked. Have a look at how to reduce the chance of being blocked while web-scraping, there are eleven methods to bypass blocks from most websites and some of them will be covered in this blog post.

Full Code

<span>import</span> <span>time</span><span>,</span> <span>json</span><span>,</span> <span>re</span>
<span>from</span> <span>parsel</span> <span>import</span> <span>Selector</span>
<span>from</span> <span>playwright.sync_api</span> <span>import</span> <span>sync_playwright</span>
<span>def</span> <span>run</span><span>(</span><span>playwright</span><span>):</span>
<span>page</span> <span>=</span> <span>playwright</span><span>.</span><span>chromium</span><span>.</span><span>launch</span><span>(</span><span>headless</span><span>=</span><span>True</span><span>).</span><span>new_page</span><span>()</span>
<span>page</span><span>.</span><span>goto</span><span>(</span><span>"https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US"</span><span>)</span>
<span>user_comments</span> <span>=</span> <span>[]</span>
<span># if "See all reviews" button present </span> <span>if</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>):</span>
<span>print</span><span>(</span><span>"the button is present."</span><span>)</span>
<span>print</span><span>(</span><span>"clicking on the button."</span><span>)</span>
<span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>).</span><span>click</span><span>(</span><span>force</span><span>=</span><span>True</span><span>)</span>
<span>print</span><span>(</span><span>"waiting a few sec to load comments."</span><span>)</span>
<span>time</span><span>.</span><span>sleep</span><span>(</span><span>4</span><span>)</span>
<span>last_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span> <span># 2200 </span>
<span>while</span> <span>True</span><span>:</span>
<span>print</span><span>(</span><span>"scrolling.."</span><span>)</span>
<span>page</span><span>.</span><span>keyboard</span><span>.</span><span>press</span><span>(</span><span>"End"</span><span>)</span>
<span>time</span><span>.</span><span>sleep</span><span>(</span><span>3</span><span>)</span>
<span>new_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span>
<span>if</span> <span>new_height</span> <span>==</span> <span>last_height</span><span>:</span>
<span>break</span>
<span>else</span><span>:</span>
<span>last_height</span> <span>=</span> <span>new_height</span>
<span>selector</span> <span>=</span> <span>Selector</span><span>(</span><span>text</span><span>=</span><span>page</span><span>.</span><span>content</span><span>())</span>
<span>page</span><span>.</span><span>close</span><span>()</span>
<span>print</span><span>(</span><span>"done scrolling. Exctracting comments..."</span><span>)</span>
<span>for</span> <span>index</span><span>,</span> <span>comment</span> <span>in</span> <span>enumerate</span><span>(</span><span>selector</span><span>.</span><span>css</span><span>(</span><span>".RHo1pe"</span><span>),</span> <span>start</span><span>=</span><span>1</span><span>):</span>
<span>comment_likes</span> <span>=</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".AJTPZc::text"</span><span>).</span><span>get</span><span>()</span>
<span>user_comments</span><span>.</span><span>append</span><span>({</span>
<span>"position"</span><span>:</span> <span>index</span><span>,</span>
<span>"user_name"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".X5PpBb::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"user_avatar"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".gSGphe img::attr(srcset)"</span><span>).</span><span>get</span><span>().</span><span>replace</span><span>(</span><span>" 2x"</span><span>,</span> <span>""</span><span>),</span>
<span>"user_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".h3YV2d::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"comment_likes"</span><span>:</span> <span>comment_likes</span><span>.</span><span>split</span><span>(</span><span>"people"</span><span>)[</span><span>0</span><span>].</span><span>strip</span><span>()</span> <span>if</span> <span>comment_likes</span> <span>else</span> <span>None</span><span>,</span>
<span>"app_rating"</span><span>:</span> <span>re</span><span>.</span><span>search</span><span>(</span><span>r</span><span>"\d+"</span><span>,</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".iXRFPc::attr(aria-label)"</span><span>).</span><span>get</span><span>()).</span><span>group</span><span>(),</span>
<span>"comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".bp9Aid::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"developer_comment"</span><span>:</span> <span>{</span>
<span>"dev_title"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I6j64d::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"dev_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".ras4vb div::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"dev_comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I9Jtec::text"</span><span>).</span><span>get</span><span>()</span>
<span>}</span>
<span>})</span>
<span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>user_comments</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span>
<span>with</span> <span>sync_playwright</span><span>()</span> <span>as</span> <span>playwright</span><span>:</span>
<span>run</span><span>(</span><span>playwright</span><span>)</span>
<span>import</span> <span>time</span><span>,</span> <span>json</span><span>,</span> <span>re</span>
<span>from</span> <span>parsel</span> <span>import</span> <span>Selector</span>
<span>from</span> <span>playwright.sync_api</span> <span>import</span> <span>sync_playwright</span>


<span>def</span> <span>run</span><span>(</span><span>playwright</span><span>):</span>
    <span>page</span> <span>=</span> <span>playwright</span><span>.</span><span>chromium</span><span>.</span><span>launch</span><span>(</span><span>headless</span><span>=</span><span>True</span><span>).</span><span>new_page</span><span>()</span>
    <span>page</span><span>.</span><span>goto</span><span>(</span><span>"https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US"</span><span>)</span>

    <span>user_comments</span> <span>=</span> <span>[]</span>

    <span># if "See all reviews" button present </span>    <span>if</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>):</span>
        <span>print</span><span>(</span><span>"the button is present."</span><span>)</span>

        <span>print</span><span>(</span><span>"clicking on the button."</span><span>)</span>
        <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>).</span><span>click</span><span>(</span><span>force</span><span>=</span><span>True</span><span>)</span>

        <span>print</span><span>(</span><span>"waiting a few sec to load comments."</span><span>)</span>
        <span>time</span><span>.</span><span>sleep</span><span>(</span><span>4</span><span>)</span>

        <span>last_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span>  <span># 2200 </span>
        <span>while</span> <span>True</span><span>:</span>
            <span>print</span><span>(</span><span>"scrolling.."</span><span>)</span>
            <span>page</span><span>.</span><span>keyboard</span><span>.</span><span>press</span><span>(</span><span>"End"</span><span>)</span>
            <span>time</span><span>.</span><span>sleep</span><span>(</span><span>3</span><span>)</span>

            <span>new_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span>

            <span>if</span> <span>new_height</span> <span>==</span> <span>last_height</span><span>:</span>
                <span>break</span>
            <span>else</span><span>:</span>
                <span>last_height</span> <span>=</span> <span>new_height</span>

    <span>selector</span> <span>=</span> <span>Selector</span><span>(</span><span>text</span><span>=</span><span>page</span><span>.</span><span>content</span><span>())</span>
    <span>page</span><span>.</span><span>close</span><span>()</span>

    <span>print</span><span>(</span><span>"done scrolling. Exctracting comments..."</span><span>)</span>
    <span>for</span> <span>index</span><span>,</span> <span>comment</span> <span>in</span> <span>enumerate</span><span>(</span><span>selector</span><span>.</span><span>css</span><span>(</span><span>".RHo1pe"</span><span>),</span> <span>start</span><span>=</span><span>1</span><span>):</span>

        <span>comment_likes</span> <span>=</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".AJTPZc::text"</span><span>).</span><span>get</span><span>()</span>   

        <span>user_comments</span><span>.</span><span>append</span><span>({</span>
            <span>"position"</span><span>:</span> <span>index</span><span>,</span>
            <span>"user_name"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".X5PpBb::text"</span><span>).</span><span>get</span><span>(),</span>
            <span>"user_avatar"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".gSGphe img::attr(srcset)"</span><span>).</span><span>get</span><span>().</span><span>replace</span><span>(</span><span>" 2x"</span><span>,</span> <span>""</span><span>),</span>
            <span>"user_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".h3YV2d::text"</span><span>).</span><span>get</span><span>(),</span>
            <span>"comment_likes"</span><span>:</span> <span>comment_likes</span><span>.</span><span>split</span><span>(</span><span>"people"</span><span>)[</span><span>0</span><span>].</span><span>strip</span><span>()</span> <span>if</span> <span>comment_likes</span> <span>else</span> <span>None</span><span>,</span>
            <span>"app_rating"</span><span>:</span> <span>re</span><span>.</span><span>search</span><span>(</span><span>r</span><span>"\d+"</span><span>,</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".iXRFPc::attr(aria-label)"</span><span>).</span><span>get</span><span>()).</span><span>group</span><span>(),</span>
            <span>"comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".bp9Aid::text"</span><span>).</span><span>get</span><span>(),</span>
            <span>"developer_comment"</span><span>:</span> <span>{</span>
                <span>"dev_title"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I6j64d::text"</span><span>).</span><span>get</span><span>(),</span>
                <span>"dev_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".ras4vb div::text"</span><span>).</span><span>get</span><span>(),</span>
                <span>"dev_comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I9Jtec::text"</span><span>).</span><span>get</span><span>()</span>
            <span>}</span>
        <span>})</span>

    <span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>user_comments</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span>


<span>with</span> <span>sync_playwright</span><span>()</span> <span>as</span> <span>playwright</span><span>:</span>
    <span>run</span><span>(</span><span>playwright</span><span>)</span>
import time, json, re from parsel import Selector from playwright.sync_api import sync_playwright def run(playwright): page = playwright.chromium.launch(headless=True).new_page() page.goto("https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US") user_comments = [] # if "See all reviews" button present if page.query_selector('.Jwxk6d .u4ICaf button'): print("the button is present.") print("clicking on the button.") page.query_selector('.Jwxk6d .u4ICaf button').click(force=True) print("waiting a few sec to load comments.") time.sleep(4) last_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') # 2200 while True: print("scrolling..") page.keyboard.press("End") time.sleep(3) new_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') if new_height == last_height: break else: last_height = new_height selector = Selector(text=page.content()) page.close() print("done scrolling. Exctracting comments...") for index, comment in enumerate(selector.css(".RHo1pe"), start=1): comment_likes = comment.css(".AJTPZc::text").get() user_comments.append({ "position": index, "user_name": comment.css(".X5PpBb::text").get(), "user_avatar": comment.css(".gSGphe img::attr(srcset)").get().replace(" 2x", ""), "user_comment": comment.css(".h3YV2d::text").get(), "comment_likes": comment_likes.split("people")[0].strip() if comment_likes else None, "app_rating": re.search(r"\d+", comment.css(".iXRFPc::attr(aria-label)").get()).group(), "comment_date": comment.css(".bp9Aid::text").get(), "developer_comment": { "dev_title": comment.css(".I6j64d::text").get(), "dev_comment": comment.css(".ras4vb div::text").get(), "dev_comment_date": comment.css(".I9Jtec::text").get() } }) print(json.dumps(user_comments, indent=2, ensure_ascii=False)) with sync_playwright() as playwright: run(playwright)

Enter fullscreen mode Exit fullscreen mode

Code Explanation

Import libraries:

<span>import</span> <span>time</span><span>,</span> <span>json</span>
<span>from</span> <span>playwright.sync_api</span> <span>import</span> <span>sync_playwright</span>
<span>import</span> <span>time</span><span>,</span> <span>json</span>
<span>from</span> <span>playwright.sync_api</span> <span>import</span> <span>sync_playwright</span>
import time, json from playwright.sync_api import sync_playwright

Enter fullscreen mode Exit fullscreen mode

  • time to set a sleep() intervals between each scroll.
  • json just for pretty printing.
  • sync_playwright for synchronous API. playwright have asynchronous API as well using asyncio module.

Declare a function:

<span>def</span> <span>run</span><span>(</span><span>playwright</span><span>):</span>
<span># further code.. </span>
<span>def</span> <span>run</span><span>(</span><span>playwright</span><span>):</span>
    <span># further code.. </span>
def run(playwright): # further code..

Enter fullscreen mode Exit fullscreen mode

Initialize playwright, connect to chromium, launch() a browser new_page() and goto() a given URL:

<span>page</span> <span>=</span> <span>playwright</span><span>.</span><span>chromium</span><span>.</span><span>launch</span><span>(</span><span>headless</span><span>=</span><span>False</span><span>).</span><span>new_page</span><span>()</span>
<span>page</span><span>.</span><span>goto</span><span>(</span><span>"https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US"</span><span>)</span>
<span>user_comments</span> <span>=</span> <span>[]</span> <span># temporary list for all extracted data </span>
<span>page</span> <span>=</span> <span>playwright</span><span>.</span><span>chromium</span><span>.</span><span>launch</span><span>(</span><span>headless</span><span>=</span><span>False</span><span>).</span><span>new_page</span><span>()</span>
<span>page</span><span>.</span><span>goto</span><span>(</span><span>"https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US"</span><span>)</span>

<span>user_comments</span> <span>=</span> <span>[]</span> <span># temporary list for all extracted data </span>
page = playwright.chromium.launch(headless=False).new_page() page.goto("https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US") user_comments = [] # temporary list for all extracted data

Enter fullscreen mode Exit fullscreen mode

Next, we need to check if the button responsible for showing all reviews is present and click on it if present:

<span>if</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>):</span>
<span>print</span><span>(</span><span>"the button is present."</span><span>)</span>
<span>print</span><span>(</span><span>"clicking on the button."</span><span>)</span>
<span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>).</span><span>click</span><span>(</span><span>force</span><span>=</span><span>True</span><span>)</span>
<span>print</span><span>(</span><span>"waiting a few sec to load comments."</span><span>)</span>
<span>time</span><span>.</span><span>sleep</span><span>(</span><span>4</span><span>)</span>
<span>if</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>):</span>
    <span>print</span><span>(</span><span>"the button is present."</span><span>)</span>

    <span>print</span><span>(</span><span>"clicking on the button."</span><span>)</span>
    <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>).</span><span>click</span><span>(</span><span>force</span><span>=</span><span>True</span><span>)</span>

    <span>print</span><span>(</span><span>"waiting a few sec to load comments."</span><span>)</span>
    <span>time</span><span>.</span><span>sleep</span><span>(</span><span>4</span><span>)</span>
if page.query_selector('.Jwxk6d .u4ICaf button'): print("the button is present.") print("clicking on the button.") page.query_selector('.Jwxk6d .u4ICaf button').click(force=True) print("waiting a few sec to load comments.") time.sleep(4)

Enter fullscreen mode Exit fullscreen mode

  • query_selector is function that accepts CSS selectors to be searched.
  • click is to click on the button and force=True will bypass any auto-waits and click immidiately.

Scroll to the bottom of the comments window:

<span>last_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span> <span># 2200 </span>
<span>while</span> <span>True</span><span>:</span>
<span>print</span><span>(</span><span>"scrolling.."</span><span>)</span>
<span>page</span><span>.</span><span>keyboard</span><span>.</span><span>press</span><span>(</span><span>"End"</span><span>)</span>
<span>time</span><span>.</span><span>sleep</span><span>(</span><span>3</span><span>)</span>
<span>new_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span>
<span>if</span> <span>new_height</span> <span>==</span> <span>last_height</span><span>:</span>
<span>break</span>
<span>else</span><span>:</span>
<span>last_height</span> <span>=</span> <span>new_height</span>
<span>last_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span>  <span># 2200 </span>
<span>while</span> <span>True</span><span>:</span>
    <span>print</span><span>(</span><span>"scrolling.."</span><span>)</span>
    <span>page</span><span>.</span><span>keyboard</span><span>.</span><span>press</span><span>(</span><span>"End"</span><span>)</span>
    <span>time</span><span>.</span><span>sleep</span><span>(</span><span>3</span><span>)</span>

    <span>new_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span>

    <span>if</span> <span>new_height</span> <span>==</span> <span>last_height</span><span>:</span>
        <span>break</span>
    <span>else</span><span>:</span>
        <span>last_height</span> <span>=</span> <span>new_height</span>
last_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') # 2200 while True: print("scrolling..") page.keyboard.press("End") time.sleep(3) new_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') if new_height == last_height: break else: last_height = new_height

Enter fullscreen mode Exit fullscreen mode

  • page.evaluate() will run a JavaScript code in the browser context that will measurement of the height of the .fysCi selector. scrollTop gets the number of pixels scrolled from a given element, in this case CSS selector.
  • time.sleep(3) will stop code execution for 3 seconds to load more comments.
  • Then it will measure a new_height after the scroll running the same measurement JavaScript code.
  • Finally, it will check if new_height == last_height, and if so, exit the while loop by using break.
  • else set the last_height to new_height and run the iteration (scroll) again.

After that, pass scrolled HTML content to parsel, close the browser:

<span>selector</span> <span>=</span> <span>Selector</span><span>(</span><span>text</span><span>=</span><span>page</span><span>.</span><span>content</span><span>())</span>
<span>page</span><span>.</span><span>close</span><span>()</span>
<span>selector</span> <span>=</span> <span>Selector</span><span>(</span><span>text</span><span>=</span><span>page</span><span>.</span><span>content</span><span>())</span>
<span>page</span><span>.</span><span>close</span><span>()</span>
selector = Selector(text=page.content()) page.close()

Enter fullscreen mode Exit fullscreen mode

Iterate over all results after the while loop is done:

<span>for</span> <span>index</span><span>,</span> <span>comment</span> <span>in</span> <span>enumerate</span><span>(</span><span>selector</span><span>.</span><span>css</span><span>(</span><span>".RHo1pe"</span><span>),</span> <span>start</span><span>=</span><span>1</span><span>):</span>
<span>comment_likes</span> <span>=</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".AJTPZc::text"</span><span>).</span><span>get</span><span>()</span>
<span>user_comments</span><span>.</span><span>append</span><span>({</span>
<span>"position"</span><span>:</span> <span>index</span><span>,</span>
<span>"user_name"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".X5PpBb::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"user_avatar"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".gSGphe img::attr(srcset)"</span><span>).</span><span>get</span><span>().</span><span>replace</span><span>(</span><span>" 2x"</span><span>,</span> <span>""</span><span>),</span>
<span>"user_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".h3YV2d::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"comment_likes"</span><span>:</span> <span>comment_likes</span><span>.</span><span>split</span><span>(</span><span>"people"</span><span>)[</span><span>0</span><span>].</span><span>strip</span><span>()</span> <span>if</span> <span>comment_likes</span> <span>else</span> <span>None</span><span>,</span>
<span>"app_rating"</span><span>:</span> <span>re</span><span>.</span><span>search</span><span>(</span><span>r</span><span>"\d+"</span><span>,</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".iXRFPc::attr(aria-label)"</span><span>).</span><span>get</span><span>()).</span><span>group</span><span>(),</span>
<span>"comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".bp9Aid::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"developer_comment"</span><span>:</span> <span>{</span>
<span>"dev_title"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I6j64d::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"dev_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".ras4vb div::text"</span><span>).</span><span>get</span><span>(),</span>
<span>"dev_comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I9Jtec::text"</span><span>).</span><span>get</span><span>()</span>
<span>}</span>
<span>})</span>
<span>for</span> <span>index</span><span>,</span> <span>comment</span> <span>in</span> <span>enumerate</span><span>(</span><span>selector</span><span>.</span><span>css</span><span>(</span><span>".RHo1pe"</span><span>),</span> <span>start</span><span>=</span><span>1</span><span>):</span>

    <span>comment_likes</span> <span>=</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".AJTPZc::text"</span><span>).</span><span>get</span><span>()</span>   

    <span>user_comments</span><span>.</span><span>append</span><span>({</span>
        <span>"position"</span><span>:</span> <span>index</span><span>,</span>
        <span>"user_name"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".X5PpBb::text"</span><span>).</span><span>get</span><span>(),</span>
        <span>"user_avatar"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".gSGphe img::attr(srcset)"</span><span>).</span><span>get</span><span>().</span><span>replace</span><span>(</span><span>" 2x"</span><span>,</span> <span>""</span><span>),</span>
        <span>"user_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".h3YV2d::text"</span><span>).</span><span>get</span><span>(),</span>
        <span>"comment_likes"</span><span>:</span> <span>comment_likes</span><span>.</span><span>split</span><span>(</span><span>"people"</span><span>)[</span><span>0</span><span>].</span><span>strip</span><span>()</span> <span>if</span> <span>comment_likes</span> <span>else</span> <span>None</span><span>,</span>
        <span>"app_rating"</span><span>:</span> <span>re</span><span>.</span><span>search</span><span>(</span><span>r</span><span>"\d+"</span><span>,</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".iXRFPc::attr(aria-label)"</span><span>).</span><span>get</span><span>()).</span><span>group</span><span>(),</span>
        <span>"comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".bp9Aid::text"</span><span>).</span><span>get</span><span>(),</span>
        <span>"developer_comment"</span><span>:</span> <span>{</span>
            <span>"dev_title"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I6j64d::text"</span><span>).</span><span>get</span><span>(),</span>
            <span>"dev_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".ras4vb div::text"</span><span>).</span><span>get</span><span>(),</span>
            <span>"dev_comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I9Jtec::text"</span><span>).</span><span>get</span><span>()</span>
        <span>}</span>
    <span>})</span>
for index, comment in enumerate(selector.css(".RHo1pe"), start=1): comment_likes = comment.css(".AJTPZc::text").get() user_comments.append({ "position": index, "user_name": comment.css(".X5PpBb::text").get(), "user_avatar": comment.css(".gSGphe img::attr(srcset)").get().replace(" 2x", ""), "user_comment": comment.css(".h3YV2d::text").get(), "comment_likes": comment_likes.split("people")[0].strip() if comment_likes else None, "app_rating": re.search(r"\d+", comment.css(".iXRFPc::attr(aria-label)").get()).group(), "comment_date": comment.css(".bp9Aid::text").get(), "developer_comment": { "dev_title": comment.css(".I6j64d::text").get(), "dev_comment": comment.css(".ras4vb div::text").get(), "dev_comment_date": comment.css(".I9Jtec::text").get() } })

Enter fullscreen mode Exit fullscreen mode

Print the data:

<span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>user_comments</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span>
<span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>user_comments</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span>
print(json.dumps(user_comments, indent=2, ensure_ascii=False))

Enter fullscreen mode Exit fullscreen mode

Run your code using context manager:

<span>with</span> <span>sync_playwright</span><span>()</span> <span>as</span> <span>playwright</span><span>:</span>
<span>run</span><span>(</span><span>playwright</span><span>)</span>
<span>with</span> <span>sync_playwright</span><span>()</span> <span>as</span> <span>playwright</span><span>:</span>
    <span>run</span><span>(</span><span>playwright</span><span>)</span>
with sync_playwright() as playwright: run(playwright)

Enter fullscreen mode Exit fullscreen mode

Output

<span>[</span><span> </span><span>{</span><span> </span><span>"position"</span><span>:</span><span> </span><span>1</span><span>,</span><span> </span><span>"user_name"</span><span>:</span><span> </span><span>"JazzTripp"</span><span>,</span><span> </span><span>"user_avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg"</span><span>,</span><span> </span><span>"user_comment"</span><span>:</span><span> </span><span>"This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!"</span><span>,</span><span> </span><span>"comment_likes"</span><span>:</span><span> </span><span>"20"</span><span>,</span><span> </span><span>"app_rating"</span><span>:</span><span> </span><span>"5"</span><span>,</span><span> </span><span>"comment_date"</span><span>:</span><span> </span><span>"May 06, 2022"</span><span>,</span><span> </span><span>"developer_comment"</span><span>:</span><span> </span><span>null</span><span> </span><span>},</span><span> </span><span>...</span><span> </span><span>other</span><span> </span><span>results</span><span> </span><span>{</span><span> </span><span>"position"</span><span>:</span><span> </span><span>875</span><span>,</span><span> </span><span>"user_name"</span><span>:</span><span> </span><span>"Originalbigguy"</span><span>,</span><span> </span><span>"user_avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=s64-rw-mo"</span><span>,</span><span> </span><span>"user_comment"</span><span>:</span><span> </span><span>"Not free"</span><span>,</span><span> </span><span>"comment_likes"</span><span>:</span><span> </span><span>null</span><span>,</span><span> </span><span>"app_rating"</span><span>:</span><span> </span><span>"1"</span><span>,</span><span> </span><span>"comment_date"</span><span>:</span><span> </span><span>"9 April 2021"</span><span>,</span><span> </span><span>"developer_comment"</span><span>:</span><span> </span><span>{</span><span> </span><span>"dev_title"</span><span>:</span><span> </span><span>"Collectorz.com"</span><span>,</span><span> </span><span>"dev_comment"</span><span>:</span><span> </span><span>"The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app.</span><span>\n</span><span>"</span><span>,</span><span> </span><span>"dev_comment_date"</span><span>:</span><span> </span><span>"10 April 2021"</span><span> </span><span>}</span><span> </span><span>}</span><span> </span><span>]</span><span> </span>
<span>[</span><span> </span><span>{</span><span> </span><span>"position"</span><span>:</span><span> </span><span>1</span><span>,</span><span> </span><span>"user_name"</span><span>:</span><span> </span><span>"JazzTripp"</span><span>,</span><span> </span><span>"user_avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg"</span><span>,</span><span> </span><span>"user_comment"</span><span>:</span><span> </span><span>"This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!"</span><span>,</span><span> </span><span>"comment_likes"</span><span>:</span><span> </span><span>"20"</span><span>,</span><span> </span><span>"app_rating"</span><span>:</span><span> </span><span>"5"</span><span>,</span><span> </span><span>"comment_date"</span><span>:</span><span> </span><span>"May 06, 2022"</span><span>,</span><span> </span><span>"developer_comment"</span><span>:</span><span> </span><span>null</span><span> </span><span>},</span><span> </span><span>...</span><span> </span><span>other</span><span> </span><span>results</span><span> </span><span>{</span><span> </span><span>"position"</span><span>:</span><span> </span><span>875</span><span>,</span><span> </span><span>"user_name"</span><span>:</span><span> </span><span>"Originalbigguy"</span><span>,</span><span> </span><span>"user_avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=s64-rw-mo"</span><span>,</span><span> </span><span>"user_comment"</span><span>:</span><span> </span><span>"Not free"</span><span>,</span><span> </span><span>"comment_likes"</span><span>:</span><span> </span><span>null</span><span>,</span><span> </span><span>"app_rating"</span><span>:</span><span> </span><span>"1"</span><span>,</span><span> </span><span>"comment_date"</span><span>:</span><span> </span><span>"9 April 2021"</span><span>,</span><span> </span><span>"developer_comment"</span><span>:</span><span> </span><span>{</span><span> </span><span>"dev_title"</span><span>:</span><span> </span><span>"Collectorz.com"</span><span>,</span><span> </span><span>"dev_comment"</span><span>:</span><span> </span><span>"The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app.</span><span>\n</span><span>"</span><span>,</span><span> </span><span>"dev_comment_date"</span><span>:</span><span> </span><span>"10 April 2021"</span><span> </span><span>}</span><span> </span><span>}</span><span> </span><span>]</span><span> </span>
[ { "position": 1, "user_name": "JazzTripp", "user_avatar": "https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg", "user_comment": "This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!", "comment_likes": "20", "app_rating": "5", "comment_date": "May 06, 2022", "developer_comment": null }, ... other results { "position": 875, "user_name": "Originalbigguy", "user_avatar": "https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=s64-rw-mo", "user_comment": "Not free", "comment_likes": null, "app_rating": "1", "comment_date": "9 April 2021", "developer_comment": { "dev_title": "Collectorz.com", "dev_comment": "The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app.\n", "dev_comment_date": "10 April 2021" } } ]

Enter fullscreen mode Exit fullscreen mode

Using Google Play Product Reviews API

As we support extracting reviews data from Google Play App, this section is to show the comparison between the DIY solution and our solution.

The biggest difference is that you don’t need to use browser automation to scrape results, create the parser from scratch and maintain it.

Keep in mind that there’s also a chance that the request might be blocked at some point from Google (or CAPTCHA), we handle it on our backend.

Installing google-search-results from PyPi:

pip <span>install </span>google-search-results
pip <span>install </span>google-search-results
pip install google-search-results

Enter fullscreen mode Exit fullscreen mode

<span>from</span> <span>serpapi</span> <span>import</span> <span>GoogleSearch</span>
<span>from</span> <span>urllib.parse</span> <span>import</span> <span>(</span><span>parse_qsl</span><span>,</span> <span>urlsplit</span><span>)</span>
<span>params</span> <span>=</span> <span>{</span>
<span>"api_key"</span><span>:</span> <span>"..."</span><span>,</span> <span># your serpapi api key </span> <span>"engine"</span><span>:</span> <span>"google_play_product"</span><span>,</span> <span># serpapi parsing engine </span> <span>"store"</span><span>:</span> <span>"apps"</span><span>,</span> <span># app results </span> <span>"gl"</span><span>:</span> <span>"us"</span><span>,</span> <span># country of the search </span> <span>"hl"</span><span>:</span> <span>"en"</span><span>,</span> <span># language of the search </span> <span>"product_id"</span><span>:</span> <span>"com.collectorz.javamobile.android.books"</span> <span># app id </span><span>}</span>
<span>search</span> <span>=</span> <span>GoogleSearch</span><span>(</span><span>params</span><span>)</span> <span># where data extraction happens on the backend </span>
<span>reviews</span> <span>=</span> <span>[]</span>
<span>while</span> <span>True</span><span>:</span>
<span>results</span> <span>=</span> <span>search</span><span>.</span><span>get_dict</span><span>()</span> <span># JSON -> Python dict </span>
<span>for</span> <span>review</span> <span>in</span> <span>results</span><span>[</span><span>"reviews"</span><span>]:</span>
<span>reviews</span><span>.</span><span>append</span><span>({</span>
<span>"title"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"title"</span><span>),</span>
<span>"avatar"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"avatar"</span><span>),</span>
<span>"rating"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"rating"</span><span>),</span>
<span>"likes"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"likes"</span><span>),</span>
<span>"date"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"date"</span><span>),</span>
<span>"snippet"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"snippet"</span><span>),</span>
<span>"response"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"response"</span><span>)</span>
<span>})</span>
<span># pagination </span> <span>if</span> <span>"next"</span> <span>in</span> <span>results</span><span>.</span><span>get</span><span>(</span><span>"serpapi_pagination"</span><span>,</span> <span>{}):</span>
<span>search</span><span>.</span><span>params_dict</span><span>.</span><span>update</span><span>(</span><span>dict</span><span>(</span><span>parse_qsl</span><span>(</span><span>urlsplit</span><span>(</span><span>results</span><span>.</span><span>get</span><span>(</span><span>"serpapi_pagination"</span><span>,</span> <span>{}).</span><span>get</span><span>(</span><span>"next"</span><span>)).</span><span>query</span><span>)))</span>
<span>else</span><span>:</span>
<span>break</span>
<span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>reviews</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span>
<span>from</span> <span>serpapi</span> <span>import</span> <span>GoogleSearch</span>
<span>from</span> <span>urllib.parse</span> <span>import</span> <span>(</span><span>parse_qsl</span><span>,</span> <span>urlsplit</span><span>)</span>

<span>params</span> <span>=</span> <span>{</span>
  <span>"api_key"</span><span>:</span> <span>"..."</span><span>,</span>                                        <span># your serpapi api key </span>  <span>"engine"</span><span>:</span> <span>"google_play_product"</span><span>,</span>                         <span># serpapi parsing engine </span>  <span>"store"</span><span>:</span> <span>"apps"</span><span>,</span>                                         <span># app results </span>  <span>"gl"</span><span>:</span> <span>"us"</span><span>,</span>                                              <span># country of the search </span>  <span>"hl"</span><span>:</span> <span>"en"</span><span>,</span>                                              <span># language of the search </span>  <span>"product_id"</span><span>:</span> <span>"com.collectorz.javamobile.android.books"</span>  <span># app id </span><span>}</span>

<span>search</span> <span>=</span> <span>GoogleSearch</span><span>(</span><span>params</span><span>)</span>                              <span># where data extraction happens on the backend </span>
<span>reviews</span> <span>=</span> <span>[]</span>

<span>while</span> <span>True</span><span>:</span>
    <span>results</span> <span>=</span> <span>search</span><span>.</span><span>get_dict</span><span>()</span>                            <span># JSON -> Python dict </span>
    <span>for</span> <span>review</span> <span>in</span> <span>results</span><span>[</span><span>"reviews"</span><span>]:</span>
        <span>reviews</span><span>.</span><span>append</span><span>({</span>
            <span>"title"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"title"</span><span>),</span>
            <span>"avatar"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"avatar"</span><span>),</span>
            <span>"rating"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"rating"</span><span>),</span>
            <span>"likes"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"likes"</span><span>),</span>
            <span>"date"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"date"</span><span>),</span>
            <span>"snippet"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"snippet"</span><span>),</span>
            <span>"response"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"response"</span><span>)</span>
        <span>})</span>

    <span># pagination </span>    <span>if</span> <span>"next"</span> <span>in</span> <span>results</span><span>.</span><span>get</span><span>(</span><span>"serpapi_pagination"</span><span>,</span> <span>{}):</span>
        <span>search</span><span>.</span><span>params_dict</span><span>.</span><span>update</span><span>(</span><span>dict</span><span>(</span><span>parse_qsl</span><span>(</span><span>urlsplit</span><span>(</span><span>results</span><span>.</span><span>get</span><span>(</span><span>"serpapi_pagination"</span><span>,</span> <span>{}).</span><span>get</span><span>(</span><span>"next"</span><span>)).</span><span>query</span><span>)))</span>
    <span>else</span><span>:</span>
        <span>break</span>

<span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>reviews</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span>
from serpapi import GoogleSearch from urllib.parse import (parse_qsl, urlsplit) params = { "api_key": "...", # your serpapi api key "engine": "google_play_product", # serpapi parsing engine "store": "apps", # app results "gl": "us", # country of the search "hl": "en", # language of the search "product_id": "com.collectorz.javamobile.android.books" # app id } search = GoogleSearch(params) # where data extraction happens on the backend reviews = [] while True: results = search.get_dict() # JSON -> Python dict for review in results["reviews"]: reviews.append({ "title": review.get("title"), "avatar": review.get("avatar"), "rating": review.get("rating"), "likes": review.get("likes"), "date": review.get("date"), "snippet": review.get("snippet"), "response": review.get("response") }) # pagination if "next" in results.get("serpapi_pagination", {}): search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) else: break print(json.dumps(reviews, indent=2, ensure_ascii=False))

Enter fullscreen mode Exit fullscreen mode

Output:

<span>[</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"JazzTripp"</span><span>,</span><span> </span><span>"avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg"</span><span>,</span><span> </span><span>"rating"</span><span>:</span><span> </span><span>5.0</span><span>,</span><span> </span><span>"likes"</span><span>:</span><span> </span><span>20</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"May 06, 2022"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!"</span><span>,</span><span> </span><span>"response"</span><span>:</span><span> </span><span>null</span><span> </span><span>},</span><span> </span><span>...</span><span> </span><span>other</span><span> </span><span>reviews</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"Originalbigguy"</span><span>,</span><span> </span><span>"avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=mo"</span><span>,</span><span> </span><span>"rating"</span><span>:</span><span> </span><span>1.0</span><span>,</span><span> </span><span>"likes"</span><span>:</span><span> </span><span>0</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"April 09, 2021"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"Not free"</span><span>,</span><span> </span><span>"response"</span><span>:</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"Collectorz.com"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app."</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"April 10, 2021"</span><span> </span><span>}</span><span> </span><span>}</span><span> </span><span>]</span><span> </span>
<span>[</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"JazzTripp"</span><span>,</span><span> </span><span>"avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg"</span><span>,</span><span> </span><span>"rating"</span><span>:</span><span> </span><span>5.0</span><span>,</span><span> </span><span>"likes"</span><span>:</span><span> </span><span>20</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"May 06, 2022"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!"</span><span>,</span><span> </span><span>"response"</span><span>:</span><span> </span><span>null</span><span> </span><span>},</span><span> </span><span>...</span><span> </span><span>other</span><span> </span><span>reviews</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"Originalbigguy"</span><span>,</span><span> </span><span>"avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=mo"</span><span>,</span><span> </span><span>"rating"</span><span>:</span><span> </span><span>1.0</span><span>,</span><span> </span><span>"likes"</span><span>:</span><span> </span><span>0</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"April 09, 2021"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"Not free"</span><span>,</span><span> </span><span>"response"</span><span>:</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"Collectorz.com"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app."</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"April 10, 2021"</span><span> </span><span>}</span><span> </span><span>}</span><span> </span><span>]</span><span> </span>
[ { "title": "JazzTripp", "avatar": "https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg", "rating": 5.0, "likes": 20, "date": "May 06, 2022", "snippet": "This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!", "response": null }, ... other reviews { "title": "Originalbigguy", "avatar": "https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=mo", "rating": 1.0, "likes": 0, "date": "April 09, 2021", "snippet": "Not free", "response": { "title": "Collectorz.com", "snippet": "The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app.", "date": "April 10, 2021" } } ]

Enter fullscreen mode Exit fullscreen mode

Links

Join us on Reddit | Twitter | YouTube

Google Play Web Scraping (3 Part Series)

1 Scrape Google Play Store App in Python
2 Scrape Google Play Search Apps in Python
3 Web Scraping All Google Play App Reviews in Python

原文链接:Web Scraping All Google Play App Reviews in Python

© 版权声明
THE END
喜欢就支持一下吧
点赞8 分享
Sometimes, you have to make your own happy ending.
有时候,只能靠自己书写自己的美好结局
评论 抢沙发

请登录后发表评论

    暂无评论内容