Google Play Web Scraping (3 Part Series)
1 Scrape Google Play Store App in Python
2 Scrape Google Play Search Apps in Python
3 Web Scraping All Google Play App Reviews in Python
What will be scraped
Prerequisites
Basic knowledge scraping with CSS selectors
CSS selectors declare which part of the markup a style applies to thus allowing to extract data from matching tags and attributes.
If you haven’t scraped with CSS selectors, there’s a dedicated blog post of mine about how to use CSS selectors when web-scraping that covers what it is, pros and cons, and why they’re matter from a web-scraping perspective.
Separate virtual environment
In short, it’s a thing that creates an independent set of installed libraries including different Python versions that can coexist with each other at the same system thus prevention libraries or Python version conflicts.
If you didn’t work with a virtual environment before, have a look at the dedicated Python virtual environments tutorial using Virtualenv and Poetry blog post of mine to get familiar.
Note: this is not a strict requirement for this blog post.
Install libraries:
pip <span>install </span>playwright parselpip <span>install </span>playwright parselpip install playwright parsel
Enter fullscreen mode Exit fullscreen mode
You also need to install chromium for playwright
to work and operate the browser:
playwright <span>install </span>chromiumplaywright <span>install </span>chromiumplaywright install chromium
Enter fullscreen mode Exit fullscreen mode
After that, if you’re on Linux, you might need to install additional things (playwright
will prompt you in the terminal in case something is missing):
<span>sudo </span>apt-get <span>install</span> <span>-y</span> libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libatspi2.0-0 libwayland-client0<span>sudo </span>apt-get <span>install</span> <span>-y</span> libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libatspi2.0-0 libwayland-client0sudo apt-get install -y libnss3 libnspr4 libatk1.0-0 libatk-bridge2.0-0 libcups2 libdrm2 libxkbcommon0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libatspi2.0-0 libwayland-client0
Enter fullscreen mode Exit fullscreen mode
Reduce the chance of being blocked
There’s a chance that a request might be blocked. Have a look at how to reduce the chance of being blocked while web-scraping, there are eleven methods to bypass blocks from most websites and some of them will be covered in this blog post.
Full Code
<span>import</span> <span>time</span><span>,</span> <span>json</span><span>,</span> <span>re</span><span>from</span> <span>parsel</span> <span>import</span> <span>Selector</span><span>from</span> <span>playwright.sync_api</span> <span>import</span> <span>sync_playwright</span><span>def</span> <span>run</span><span>(</span><span>playwright</span><span>):</span><span>page</span> <span>=</span> <span>playwright</span><span>.</span><span>chromium</span><span>.</span><span>launch</span><span>(</span><span>headless</span><span>=</span><span>True</span><span>).</span><span>new_page</span><span>()</span><span>page</span><span>.</span><span>goto</span><span>(</span><span>"https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US"</span><span>)</span><span>user_comments</span> <span>=</span> <span>[]</span><span># if "See all reviews" button present </span> <span>if</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>):</span><span>print</span><span>(</span><span>"the button is present."</span><span>)</span><span>print</span><span>(</span><span>"clicking on the button."</span><span>)</span><span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>).</span><span>click</span><span>(</span><span>force</span><span>=</span><span>True</span><span>)</span><span>print</span><span>(</span><span>"waiting a few sec to load comments."</span><span>)</span><span>time</span><span>.</span><span>sleep</span><span>(</span><span>4</span><span>)</span><span>last_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span> <span># 2200 </span><span>while</span> <span>True</span><span>:</span><span>print</span><span>(</span><span>"scrolling.."</span><span>)</span><span>page</span><span>.</span><span>keyboard</span><span>.</span><span>press</span><span>(</span><span>"End"</span><span>)</span><span>time</span><span>.</span><span>sleep</span><span>(</span><span>3</span><span>)</span><span>new_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span><span>if</span> <span>new_height</span> <span>==</span> <span>last_height</span><span>:</span><span>break</span><span>else</span><span>:</span><span>last_height</span> <span>=</span> <span>new_height</span><span>selector</span> <span>=</span> <span>Selector</span><span>(</span><span>text</span><span>=</span><span>page</span><span>.</span><span>content</span><span>())</span><span>page</span><span>.</span><span>close</span><span>()</span><span>print</span><span>(</span><span>"done scrolling. Exctracting comments..."</span><span>)</span><span>for</span> <span>index</span><span>,</span> <span>comment</span> <span>in</span> <span>enumerate</span><span>(</span><span>selector</span><span>.</span><span>css</span><span>(</span><span>".RHo1pe"</span><span>),</span> <span>start</span><span>=</span><span>1</span><span>):</span><span>comment_likes</span> <span>=</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".AJTPZc::text"</span><span>).</span><span>get</span><span>()</span><span>user_comments</span><span>.</span><span>append</span><span>({</span><span>"position"</span><span>:</span> <span>index</span><span>,</span><span>"user_name"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".X5PpBb::text"</span><span>).</span><span>get</span><span>(),</span><span>"user_avatar"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".gSGphe img::attr(srcset)"</span><span>).</span><span>get</span><span>().</span><span>replace</span><span>(</span><span>" 2x"</span><span>,</span> <span>""</span><span>),</span><span>"user_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".h3YV2d::text"</span><span>).</span><span>get</span><span>(),</span><span>"comment_likes"</span><span>:</span> <span>comment_likes</span><span>.</span><span>split</span><span>(</span><span>"people"</span><span>)[</span><span>0</span><span>].</span><span>strip</span><span>()</span> <span>if</span> <span>comment_likes</span> <span>else</span> <span>None</span><span>,</span><span>"app_rating"</span><span>:</span> <span>re</span><span>.</span><span>search</span><span>(</span><span>r</span><span>"\d+"</span><span>,</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".iXRFPc::attr(aria-label)"</span><span>).</span><span>get</span><span>()).</span><span>group</span><span>(),</span><span>"comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".bp9Aid::text"</span><span>).</span><span>get</span><span>(),</span><span>"developer_comment"</span><span>:</span> <span>{</span><span>"dev_title"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I6j64d::text"</span><span>).</span><span>get</span><span>(),</span><span>"dev_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".ras4vb div::text"</span><span>).</span><span>get</span><span>(),</span><span>"dev_comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I9Jtec::text"</span><span>).</span><span>get</span><span>()</span><span>}</span><span>})</span><span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>user_comments</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span><span>with</span> <span>sync_playwright</span><span>()</span> <span>as</span> <span>playwright</span><span>:</span><span>run</span><span>(</span><span>playwright</span><span>)</span><span>import</span> <span>time</span><span>,</span> <span>json</span><span>,</span> <span>re</span> <span>from</span> <span>parsel</span> <span>import</span> <span>Selector</span> <span>from</span> <span>playwright.sync_api</span> <span>import</span> <span>sync_playwright</span> <span>def</span> <span>run</span><span>(</span><span>playwright</span><span>):</span> <span>page</span> <span>=</span> <span>playwright</span><span>.</span><span>chromium</span><span>.</span><span>launch</span><span>(</span><span>headless</span><span>=</span><span>True</span><span>).</span><span>new_page</span><span>()</span> <span>page</span><span>.</span><span>goto</span><span>(</span><span>"https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US"</span><span>)</span> <span>user_comments</span> <span>=</span> <span>[]</span> <span># if "See all reviews" button present </span> <span>if</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>):</span> <span>print</span><span>(</span><span>"the button is present."</span><span>)</span> <span>print</span><span>(</span><span>"clicking on the button."</span><span>)</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>).</span><span>click</span><span>(</span><span>force</span><span>=</span><span>True</span><span>)</span> <span>print</span><span>(</span><span>"waiting a few sec to load comments."</span><span>)</span> <span>time</span><span>.</span><span>sleep</span><span>(</span><span>4</span><span>)</span> <span>last_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span> <span># 2200 </span> <span>while</span> <span>True</span><span>:</span> <span>print</span><span>(</span><span>"scrolling.."</span><span>)</span> <span>page</span><span>.</span><span>keyboard</span><span>.</span><span>press</span><span>(</span><span>"End"</span><span>)</span> <span>time</span><span>.</span><span>sleep</span><span>(</span><span>3</span><span>)</span> <span>new_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span> <span>if</span> <span>new_height</span> <span>==</span> <span>last_height</span><span>:</span> <span>break</span> <span>else</span><span>:</span> <span>last_height</span> <span>=</span> <span>new_height</span> <span>selector</span> <span>=</span> <span>Selector</span><span>(</span><span>text</span><span>=</span><span>page</span><span>.</span><span>content</span><span>())</span> <span>page</span><span>.</span><span>close</span><span>()</span> <span>print</span><span>(</span><span>"done scrolling. Exctracting comments..."</span><span>)</span> <span>for</span> <span>index</span><span>,</span> <span>comment</span> <span>in</span> <span>enumerate</span><span>(</span><span>selector</span><span>.</span><span>css</span><span>(</span><span>".RHo1pe"</span><span>),</span> <span>start</span><span>=</span><span>1</span><span>):</span> <span>comment_likes</span> <span>=</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".AJTPZc::text"</span><span>).</span><span>get</span><span>()</span> <span>user_comments</span><span>.</span><span>append</span><span>({</span> <span>"position"</span><span>:</span> <span>index</span><span>,</span> <span>"user_name"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".X5PpBb::text"</span><span>).</span><span>get</span><span>(),</span> <span>"user_avatar"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".gSGphe img::attr(srcset)"</span><span>).</span><span>get</span><span>().</span><span>replace</span><span>(</span><span>" 2x"</span><span>,</span> <span>""</span><span>),</span> <span>"user_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".h3YV2d::text"</span><span>).</span><span>get</span><span>(),</span> <span>"comment_likes"</span><span>:</span> <span>comment_likes</span><span>.</span><span>split</span><span>(</span><span>"people"</span><span>)[</span><span>0</span><span>].</span><span>strip</span><span>()</span> <span>if</span> <span>comment_likes</span> <span>else</span> <span>None</span><span>,</span> <span>"app_rating"</span><span>:</span> <span>re</span><span>.</span><span>search</span><span>(</span><span>r</span><span>"\d+"</span><span>,</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".iXRFPc::attr(aria-label)"</span><span>).</span><span>get</span><span>()).</span><span>group</span><span>(),</span> <span>"comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".bp9Aid::text"</span><span>).</span><span>get</span><span>(),</span> <span>"developer_comment"</span><span>:</span> <span>{</span> <span>"dev_title"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I6j64d::text"</span><span>).</span><span>get</span><span>(),</span> <span>"dev_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".ras4vb div::text"</span><span>).</span><span>get</span><span>(),</span> <span>"dev_comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I9Jtec::text"</span><span>).</span><span>get</span><span>()</span> <span>}</span> <span>})</span> <span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>user_comments</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span> <span>with</span> <span>sync_playwright</span><span>()</span> <span>as</span> <span>playwright</span><span>:</span> <span>run</span><span>(</span><span>playwright</span><span>)</span>import time, json, re from parsel import Selector from playwright.sync_api import sync_playwright def run(playwright): page = playwright.chromium.launch(headless=True).new_page() page.goto("https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US") user_comments = [] # if "See all reviews" button present if page.query_selector('.Jwxk6d .u4ICaf button'): print("the button is present.") print("clicking on the button.") page.query_selector('.Jwxk6d .u4ICaf button').click(force=True) print("waiting a few sec to load comments.") time.sleep(4) last_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') # 2200 while True: print("scrolling..") page.keyboard.press("End") time.sleep(3) new_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') if new_height == last_height: break else: last_height = new_height selector = Selector(text=page.content()) page.close() print("done scrolling. Exctracting comments...") for index, comment in enumerate(selector.css(".RHo1pe"), start=1): comment_likes = comment.css(".AJTPZc::text").get() user_comments.append({ "position": index, "user_name": comment.css(".X5PpBb::text").get(), "user_avatar": comment.css(".gSGphe img::attr(srcset)").get().replace(" 2x", ""), "user_comment": comment.css(".h3YV2d::text").get(), "comment_likes": comment_likes.split("people")[0].strip() if comment_likes else None, "app_rating": re.search(r"\d+", comment.css(".iXRFPc::attr(aria-label)").get()).group(), "comment_date": comment.css(".bp9Aid::text").get(), "developer_comment": { "dev_title": comment.css(".I6j64d::text").get(), "dev_comment": comment.css(".ras4vb div::text").get(), "dev_comment_date": comment.css(".I9Jtec::text").get() } }) print(json.dumps(user_comments, indent=2, ensure_ascii=False)) with sync_playwright() as playwright: run(playwright)
Enter fullscreen mode Exit fullscreen mode
Code Explanation
Import libraries:
<span>import</span> <span>time</span><span>,</span> <span>json</span><span>from</span> <span>playwright.sync_api</span> <span>import</span> <span>sync_playwright</span><span>import</span> <span>time</span><span>,</span> <span>json</span> <span>from</span> <span>playwright.sync_api</span> <span>import</span> <span>sync_playwright</span>import time, json from playwright.sync_api import sync_playwright
Enter fullscreen mode Exit fullscreen mode
-
time
to set asleep()
intervals between each scroll. -
json
just for pretty printing. -
sync_playwright
for synchronous API.playwright
have asynchronous API as well usingasyncio
module.
Declare a function:
<span>def</span> <span>run</span><span>(</span><span>playwright</span><span>):</span><span># further code.. </span><span>def</span> <span>run</span><span>(</span><span>playwright</span><span>):</span> <span># further code.. </span>def run(playwright): # further code..
Enter fullscreen mode Exit fullscreen mode
Initialize playwright
, connect to chromium
, launch()
a browser new_page()
and goto()
a given URL:
<span>page</span> <span>=</span> <span>playwright</span><span>.</span><span>chromium</span><span>.</span><span>launch</span><span>(</span><span>headless</span><span>=</span><span>False</span><span>).</span><span>new_page</span><span>()</span><span>page</span><span>.</span><span>goto</span><span>(</span><span>"https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US"</span><span>)</span><span>user_comments</span> <span>=</span> <span>[]</span> <span># temporary list for all extracted data </span><span>page</span> <span>=</span> <span>playwright</span><span>.</span><span>chromium</span><span>.</span><span>launch</span><span>(</span><span>headless</span><span>=</span><span>False</span><span>).</span><span>new_page</span><span>()</span> <span>page</span><span>.</span><span>goto</span><span>(</span><span>"https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US"</span><span>)</span> <span>user_comments</span> <span>=</span> <span>[]</span> <span># temporary list for all extracted data </span>page = playwright.chromium.launch(headless=False).new_page() page.goto("https://play.google.com/store/apps/details?id=com.collectorz.javamobile.android.books&hl=en_GB&gl=US") user_comments = [] # temporary list for all extracted data
Enter fullscreen mode Exit fullscreen mode
-
playwright.chromium
is a connection to the Chromium browser instance. -
launch()
will launch the browser, andheadless
argument will run it in headless mode. Default isTrue
. -
new_page()
creates a new page in a new browser context. -
page.goto("URL")
will make a request to provided website.
Next, we need to check if the button responsible for showing all reviews is present and click on it if present:
<span>if</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>):</span><span>print</span><span>(</span><span>"the button is present."</span><span>)</span><span>print</span><span>(</span><span>"clicking on the button."</span><span>)</span><span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>).</span><span>click</span><span>(</span><span>force</span><span>=</span><span>True</span><span>)</span><span>print</span><span>(</span><span>"waiting a few sec to load comments."</span><span>)</span><span>time</span><span>.</span><span>sleep</span><span>(</span><span>4</span><span>)</span><span>if</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>):</span> <span>print</span><span>(</span><span>"the button is present."</span><span>)</span> <span>print</span><span>(</span><span>"clicking on the button."</span><span>)</span> <span>page</span><span>.</span><span>query_selector</span><span>(</span><span>'.Jwxk6d .u4ICaf button'</span><span>).</span><span>click</span><span>(</span><span>force</span><span>=</span><span>True</span><span>)</span> <span>print</span><span>(</span><span>"waiting a few sec to load comments."</span><span>)</span> <span>time</span><span>.</span><span>sleep</span><span>(</span><span>4</span><span>)</span>if page.query_selector('.Jwxk6d .u4ICaf button'): print("the button is present.") print("clicking on the button.") page.query_selector('.Jwxk6d .u4ICaf button').click(force=True) print("waiting a few sec to load comments.") time.sleep(4)
Enter fullscreen mode Exit fullscreen mode
-
query_selector
is function that accepts CSS selectors to be searched. -
click
is to click on the button andforce=True
will bypass any auto-waits and click immidiately.
Scroll to the bottom of the comments window:
<span>last_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span> <span># 2200 </span><span>while</span> <span>True</span><span>:</span><span>print</span><span>(</span><span>"scrolling.."</span><span>)</span><span>page</span><span>.</span><span>keyboard</span><span>.</span><span>press</span><span>(</span><span>"End"</span><span>)</span><span>time</span><span>.</span><span>sleep</span><span>(</span><span>3</span><span>)</span><span>new_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span><span>if</span> <span>new_height</span> <span>==</span> <span>last_height</span><span>:</span><span>break</span><span>else</span><span>:</span><span>last_height</span> <span>=</span> <span>new_height</span><span>last_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span> <span># 2200 </span> <span>while</span> <span>True</span><span>:</span> <span>print</span><span>(</span><span>"scrolling.."</span><span>)</span> <span>page</span><span>.</span><span>keyboard</span><span>.</span><span>press</span><span>(</span><span>"End"</span><span>)</span> <span>time</span><span>.</span><span>sleep</span><span>(</span><span>3</span><span>)</span> <span>new_height</span> <span>=</span> <span>page</span><span>.</span><span>evaluate</span><span>(</span><span>'() => document.querySelector(".fysCi").scrollTop'</span><span>)</span> <span>if</span> <span>new_height</span> <span>==</span> <span>last_height</span><span>:</span> <span>break</span> <span>else</span><span>:</span> <span>last_height</span> <span>=</span> <span>new_height</span>last_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') # 2200 while True: print("scrolling..") page.keyboard.press("End") time.sleep(3) new_height = page.evaluate('() => document.querySelector(".fysCi").scrollTop') if new_height == last_height: break else: last_height = new_height
Enter fullscreen mode Exit fullscreen mode
-
page.evaluate()
will run a JavaScript code in the browser context that will measurement of the height of the.fysCi
selector.scrollTop
gets the number of pixels scrolled from a given element, in this case CSS selector. -
time.sleep(3)
will stop code execution for 3 seconds to load more comments. - Then it will measure a
new_height
after the scroll running the same measurement JavaScript code. - Finally, it will check
if new_height == last_height
, and if so, exit thewhile
loop by usingbreak
. -
else
set thelast_height
tonew_height
and run the iteration (scroll) again.
After that, pass scrolled HTML content to parsel
, close
the browser:
<span>selector</span> <span>=</span> <span>Selector</span><span>(</span><span>text</span><span>=</span><span>page</span><span>.</span><span>content</span><span>())</span><span>page</span><span>.</span><span>close</span><span>()</span><span>selector</span> <span>=</span> <span>Selector</span><span>(</span><span>text</span><span>=</span><span>page</span><span>.</span><span>content</span><span>())</span> <span>page</span><span>.</span><span>close</span><span>()</span>selector = Selector(text=page.content()) page.close()
Enter fullscreen mode Exit fullscreen mode
Iterate over all results after the while
loop is done:
<span>for</span> <span>index</span><span>,</span> <span>comment</span> <span>in</span> <span>enumerate</span><span>(</span><span>selector</span><span>.</span><span>css</span><span>(</span><span>".RHo1pe"</span><span>),</span> <span>start</span><span>=</span><span>1</span><span>):</span><span>comment_likes</span> <span>=</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".AJTPZc::text"</span><span>).</span><span>get</span><span>()</span><span>user_comments</span><span>.</span><span>append</span><span>({</span><span>"position"</span><span>:</span> <span>index</span><span>,</span><span>"user_name"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".X5PpBb::text"</span><span>).</span><span>get</span><span>(),</span><span>"user_avatar"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".gSGphe img::attr(srcset)"</span><span>).</span><span>get</span><span>().</span><span>replace</span><span>(</span><span>" 2x"</span><span>,</span> <span>""</span><span>),</span><span>"user_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".h3YV2d::text"</span><span>).</span><span>get</span><span>(),</span><span>"comment_likes"</span><span>:</span> <span>comment_likes</span><span>.</span><span>split</span><span>(</span><span>"people"</span><span>)[</span><span>0</span><span>].</span><span>strip</span><span>()</span> <span>if</span> <span>comment_likes</span> <span>else</span> <span>None</span><span>,</span><span>"app_rating"</span><span>:</span> <span>re</span><span>.</span><span>search</span><span>(</span><span>r</span><span>"\d+"</span><span>,</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".iXRFPc::attr(aria-label)"</span><span>).</span><span>get</span><span>()).</span><span>group</span><span>(),</span><span>"comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".bp9Aid::text"</span><span>).</span><span>get</span><span>(),</span><span>"developer_comment"</span><span>:</span> <span>{</span><span>"dev_title"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I6j64d::text"</span><span>).</span><span>get</span><span>(),</span><span>"dev_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".ras4vb div::text"</span><span>).</span><span>get</span><span>(),</span><span>"dev_comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I9Jtec::text"</span><span>).</span><span>get</span><span>()</span><span>}</span><span>})</span><span>for</span> <span>index</span><span>,</span> <span>comment</span> <span>in</span> <span>enumerate</span><span>(</span><span>selector</span><span>.</span><span>css</span><span>(</span><span>".RHo1pe"</span><span>),</span> <span>start</span><span>=</span><span>1</span><span>):</span> <span>comment_likes</span> <span>=</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".AJTPZc::text"</span><span>).</span><span>get</span><span>()</span> <span>user_comments</span><span>.</span><span>append</span><span>({</span> <span>"position"</span><span>:</span> <span>index</span><span>,</span> <span>"user_name"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".X5PpBb::text"</span><span>).</span><span>get</span><span>(),</span> <span>"user_avatar"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".gSGphe img::attr(srcset)"</span><span>).</span><span>get</span><span>().</span><span>replace</span><span>(</span><span>" 2x"</span><span>,</span> <span>""</span><span>),</span> <span>"user_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".h3YV2d::text"</span><span>).</span><span>get</span><span>(),</span> <span>"comment_likes"</span><span>:</span> <span>comment_likes</span><span>.</span><span>split</span><span>(</span><span>"people"</span><span>)[</span><span>0</span><span>].</span><span>strip</span><span>()</span> <span>if</span> <span>comment_likes</span> <span>else</span> <span>None</span><span>,</span> <span>"app_rating"</span><span>:</span> <span>re</span><span>.</span><span>search</span><span>(</span><span>r</span><span>"\d+"</span><span>,</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".iXRFPc::attr(aria-label)"</span><span>).</span><span>get</span><span>()).</span><span>group</span><span>(),</span> <span>"comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".bp9Aid::text"</span><span>).</span><span>get</span><span>(),</span> <span>"developer_comment"</span><span>:</span> <span>{</span> <span>"dev_title"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I6j64d::text"</span><span>).</span><span>get</span><span>(),</span> <span>"dev_comment"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".ras4vb div::text"</span><span>).</span><span>get</span><span>(),</span> <span>"dev_comment_date"</span><span>:</span> <span>comment</span><span>.</span><span>css</span><span>(</span><span>".I9Jtec::text"</span><span>).</span><span>get</span><span>()</span> <span>}</span> <span>})</span>for index, comment in enumerate(selector.css(".RHo1pe"), start=1): comment_likes = comment.css(".AJTPZc::text").get() user_comments.append({ "position": index, "user_name": comment.css(".X5PpBb::text").get(), "user_avatar": comment.css(".gSGphe img::attr(srcset)").get().replace(" 2x", ""), "user_comment": comment.css(".h3YV2d::text").get(), "comment_likes": comment_likes.split("people")[0].strip() if comment_likes else None, "app_rating": re.search(r"\d+", comment.css(".iXRFPc::attr(aria-label)").get()).group(), "comment_date": comment.css(".bp9Aid::text").get(), "developer_comment": { "dev_title": comment.css(".I6j64d::text").get(), "dev_comment": comment.css(".ras4vb div::text").get(), "dev_comment_date": comment.css(".I9Jtec::text").get() } })
Enter fullscreen mode Exit fullscreen mode
Print the data:
<span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>user_comments</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span><span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>user_comments</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span>print(json.dumps(user_comments, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode
Run your code using context manager:
<span>with</span> <span>sync_playwright</span><span>()</span> <span>as</span> <span>playwright</span><span>:</span><span>run</span><span>(</span><span>playwright</span><span>)</span><span>with</span> <span>sync_playwright</span><span>()</span> <span>as</span> <span>playwright</span><span>:</span> <span>run</span><span>(</span><span>playwright</span><span>)</span>with sync_playwright() as playwright: run(playwright)
Enter fullscreen mode Exit fullscreen mode
Output
<span>[</span><span> </span><span>{</span><span> </span><span>"position"</span><span>:</span><span> </span><span>1</span><span>,</span><span> </span><span>"user_name"</span><span>:</span><span> </span><span>"JazzTripp"</span><span>,</span><span> </span><span>"user_avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg"</span><span>,</span><span> </span><span>"user_comment"</span><span>:</span><span> </span><span>"This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!"</span><span>,</span><span> </span><span>"comment_likes"</span><span>:</span><span> </span><span>"20"</span><span>,</span><span> </span><span>"app_rating"</span><span>:</span><span> </span><span>"5"</span><span>,</span><span> </span><span>"comment_date"</span><span>:</span><span> </span><span>"May 06, 2022"</span><span>,</span><span> </span><span>"developer_comment"</span><span>:</span><span> </span><span>null</span><span> </span><span>},</span><span> </span><span>...</span><span> </span><span>other</span><span> </span><span>results</span><span> </span><span>{</span><span> </span><span>"position"</span><span>:</span><span> </span><span>875</span><span>,</span><span> </span><span>"user_name"</span><span>:</span><span> </span><span>"Originalbigguy"</span><span>,</span><span> </span><span>"user_avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=s64-rw-mo"</span><span>,</span><span> </span><span>"user_comment"</span><span>:</span><span> </span><span>"Not free"</span><span>,</span><span> </span><span>"comment_likes"</span><span>:</span><span> </span><span>null</span><span>,</span><span> </span><span>"app_rating"</span><span>:</span><span> </span><span>"1"</span><span>,</span><span> </span><span>"comment_date"</span><span>:</span><span> </span><span>"9 April 2021"</span><span>,</span><span> </span><span>"developer_comment"</span><span>:</span><span> </span><span>{</span><span> </span><span>"dev_title"</span><span>:</span><span> </span><span>"Collectorz.com"</span><span>,</span><span> </span><span>"dev_comment"</span><span>:</span><span> </span><span>"The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app.</span><span>\n</span><span>"</span><span>,</span><span> </span><span>"dev_comment_date"</span><span>:</span><span> </span><span>"10 April 2021"</span><span> </span><span>}</span><span> </span><span>}</span><span> </span><span>]</span><span> </span><span>[</span><span> </span><span>{</span><span> </span><span>"position"</span><span>:</span><span> </span><span>1</span><span>,</span><span> </span><span>"user_name"</span><span>:</span><span> </span><span>"JazzTripp"</span><span>,</span><span> </span><span>"user_avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg"</span><span>,</span><span> </span><span>"user_comment"</span><span>:</span><span> </span><span>"This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!"</span><span>,</span><span> </span><span>"comment_likes"</span><span>:</span><span> </span><span>"20"</span><span>,</span><span> </span><span>"app_rating"</span><span>:</span><span> </span><span>"5"</span><span>,</span><span> </span><span>"comment_date"</span><span>:</span><span> </span><span>"May 06, 2022"</span><span>,</span><span> </span><span>"developer_comment"</span><span>:</span><span> </span><span>null</span><span> </span><span>},</span><span> </span><span>...</span><span> </span><span>other</span><span> </span><span>results</span><span> </span><span>{</span><span> </span><span>"position"</span><span>:</span><span> </span><span>875</span><span>,</span><span> </span><span>"user_name"</span><span>:</span><span> </span><span>"Originalbigguy"</span><span>,</span><span> </span><span>"user_avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=s64-rw-mo"</span><span>,</span><span> </span><span>"user_comment"</span><span>:</span><span> </span><span>"Not free"</span><span>,</span><span> </span><span>"comment_likes"</span><span>:</span><span> </span><span>null</span><span>,</span><span> </span><span>"app_rating"</span><span>:</span><span> </span><span>"1"</span><span>,</span><span> </span><span>"comment_date"</span><span>:</span><span> </span><span>"9 April 2021"</span><span>,</span><span> </span><span>"developer_comment"</span><span>:</span><span> </span><span>{</span><span> </span><span>"dev_title"</span><span>:</span><span> </span><span>"Collectorz.com"</span><span>,</span><span> </span><span>"dev_comment"</span><span>:</span><span> </span><span>"The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app.</span><span>\n</span><span>"</span><span>,</span><span> </span><span>"dev_comment_date"</span><span>:</span><span> </span><span>"10 April 2021"</span><span> </span><span>}</span><span> </span><span>}</span><span> </span><span>]</span><span> </span>[ { "position": 1, "user_name": "JazzTripp", "user_avatar": "https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg", "user_comment": "This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!", "comment_likes": "20", "app_rating": "5", "comment_date": "May 06, 2022", "developer_comment": null }, ... other results { "position": 875, "user_name": "Originalbigguy", "user_avatar": "https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=s64-rw-mo", "user_comment": "Not free", "comment_likes": null, "app_rating": "1", "comment_date": "9 April 2021", "developer_comment": { "dev_title": "Collectorz.com", "dev_comment": "The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app.\n", "dev_comment_date": "10 April 2021" } } ]
Enter fullscreen mode Exit fullscreen mode
Using Google Play Product Reviews API
As we support extracting reviews data from Google Play App, this section is to show the comparison between the DIY solution and our solution.
The biggest difference is that you don’t need to use browser automation to scrape results, create the parser from scratch and maintain it.
Keep in mind that there’s also a chance that the request might be blocked at some point from Google (or CAPTCHA), we handle it on our backend.
Installing google-search-results
from PyPi:
pip <span>install </span>google-search-resultspip <span>install </span>google-search-resultspip install google-search-results
Enter fullscreen mode Exit fullscreen mode
<span>from</span> <span>serpapi</span> <span>import</span> <span>GoogleSearch</span><span>from</span> <span>urllib.parse</span> <span>import</span> <span>(</span><span>parse_qsl</span><span>,</span> <span>urlsplit</span><span>)</span><span>params</span> <span>=</span> <span>{</span><span>"api_key"</span><span>:</span> <span>"..."</span><span>,</span> <span># your serpapi api key </span> <span>"engine"</span><span>:</span> <span>"google_play_product"</span><span>,</span> <span># serpapi parsing engine </span> <span>"store"</span><span>:</span> <span>"apps"</span><span>,</span> <span># app results </span> <span>"gl"</span><span>:</span> <span>"us"</span><span>,</span> <span># country of the search </span> <span>"hl"</span><span>:</span> <span>"en"</span><span>,</span> <span># language of the search </span> <span>"product_id"</span><span>:</span> <span>"com.collectorz.javamobile.android.books"</span> <span># app id </span><span>}</span><span>search</span> <span>=</span> <span>GoogleSearch</span><span>(</span><span>params</span><span>)</span> <span># where data extraction happens on the backend </span><span>reviews</span> <span>=</span> <span>[]</span><span>while</span> <span>True</span><span>:</span><span>results</span> <span>=</span> <span>search</span><span>.</span><span>get_dict</span><span>()</span> <span># JSON -> Python dict </span><span>for</span> <span>review</span> <span>in</span> <span>results</span><span>[</span><span>"reviews"</span><span>]:</span><span>reviews</span><span>.</span><span>append</span><span>({</span><span>"title"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"title"</span><span>),</span><span>"avatar"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"avatar"</span><span>),</span><span>"rating"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"rating"</span><span>),</span><span>"likes"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"likes"</span><span>),</span><span>"date"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"date"</span><span>),</span><span>"snippet"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"snippet"</span><span>),</span><span>"response"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"response"</span><span>)</span><span>})</span><span># pagination </span> <span>if</span> <span>"next"</span> <span>in</span> <span>results</span><span>.</span><span>get</span><span>(</span><span>"serpapi_pagination"</span><span>,</span> <span>{}):</span><span>search</span><span>.</span><span>params_dict</span><span>.</span><span>update</span><span>(</span><span>dict</span><span>(</span><span>parse_qsl</span><span>(</span><span>urlsplit</span><span>(</span><span>results</span><span>.</span><span>get</span><span>(</span><span>"serpapi_pagination"</span><span>,</span> <span>{}).</span><span>get</span><span>(</span><span>"next"</span><span>)).</span><span>query</span><span>)))</span><span>else</span><span>:</span><span>break</span><span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>reviews</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span><span>from</span> <span>serpapi</span> <span>import</span> <span>GoogleSearch</span> <span>from</span> <span>urllib.parse</span> <span>import</span> <span>(</span><span>parse_qsl</span><span>,</span> <span>urlsplit</span><span>)</span> <span>params</span> <span>=</span> <span>{</span> <span>"api_key"</span><span>:</span> <span>"..."</span><span>,</span> <span># your serpapi api key </span> <span>"engine"</span><span>:</span> <span>"google_play_product"</span><span>,</span> <span># serpapi parsing engine </span> <span>"store"</span><span>:</span> <span>"apps"</span><span>,</span> <span># app results </span> <span>"gl"</span><span>:</span> <span>"us"</span><span>,</span> <span># country of the search </span> <span>"hl"</span><span>:</span> <span>"en"</span><span>,</span> <span># language of the search </span> <span>"product_id"</span><span>:</span> <span>"com.collectorz.javamobile.android.books"</span> <span># app id </span><span>}</span> <span>search</span> <span>=</span> <span>GoogleSearch</span><span>(</span><span>params</span><span>)</span> <span># where data extraction happens on the backend </span> <span>reviews</span> <span>=</span> <span>[]</span> <span>while</span> <span>True</span><span>:</span> <span>results</span> <span>=</span> <span>search</span><span>.</span><span>get_dict</span><span>()</span> <span># JSON -> Python dict </span> <span>for</span> <span>review</span> <span>in</span> <span>results</span><span>[</span><span>"reviews"</span><span>]:</span> <span>reviews</span><span>.</span><span>append</span><span>({</span> <span>"title"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"title"</span><span>),</span> <span>"avatar"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"avatar"</span><span>),</span> <span>"rating"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"rating"</span><span>),</span> <span>"likes"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"likes"</span><span>),</span> <span>"date"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"date"</span><span>),</span> <span>"snippet"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"snippet"</span><span>),</span> <span>"response"</span><span>:</span> <span>review</span><span>.</span><span>get</span><span>(</span><span>"response"</span><span>)</span> <span>})</span> <span># pagination </span> <span>if</span> <span>"next"</span> <span>in</span> <span>results</span><span>.</span><span>get</span><span>(</span><span>"serpapi_pagination"</span><span>,</span> <span>{}):</span> <span>search</span><span>.</span><span>params_dict</span><span>.</span><span>update</span><span>(</span><span>dict</span><span>(</span><span>parse_qsl</span><span>(</span><span>urlsplit</span><span>(</span><span>results</span><span>.</span><span>get</span><span>(</span><span>"serpapi_pagination"</span><span>,</span> <span>{}).</span><span>get</span><span>(</span><span>"next"</span><span>)).</span><span>query</span><span>)))</span> <span>else</span><span>:</span> <span>break</span> <span>print</span><span>(</span><span>json</span><span>.</span><span>dumps</span><span>(</span><span>reviews</span><span>,</span> <span>indent</span><span>=</span><span>2</span><span>,</span> <span>ensure_ascii</span><span>=</span><span>False</span><span>))</span>from serpapi import GoogleSearch from urllib.parse import (parse_qsl, urlsplit) params = { "api_key": "...", # your serpapi api key "engine": "google_play_product", # serpapi parsing engine "store": "apps", # app results "gl": "us", # country of the search "hl": "en", # language of the search "product_id": "com.collectorz.javamobile.android.books" # app id } search = GoogleSearch(params) # where data extraction happens on the backend reviews = [] while True: results = search.get_dict() # JSON -> Python dict for review in results["reviews"]: reviews.append({ "title": review.get("title"), "avatar": review.get("avatar"), "rating": review.get("rating"), "likes": review.get("likes"), "date": review.get("date"), "snippet": review.get("snippet"), "response": review.get("response") }) # pagination if "next" in results.get("serpapi_pagination", {}): search.params_dict.update(dict(parse_qsl(urlsplit(results.get("serpapi_pagination", {}).get("next")).query))) else: break print(json.dumps(reviews, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode
Output:
<span>[</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"JazzTripp"</span><span>,</span><span> </span><span>"avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg"</span><span>,</span><span> </span><span>"rating"</span><span>:</span><span> </span><span>5.0</span><span>,</span><span> </span><span>"likes"</span><span>:</span><span> </span><span>20</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"May 06, 2022"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!"</span><span>,</span><span> </span><span>"response"</span><span>:</span><span> </span><span>null</span><span> </span><span>},</span><span> </span><span>...</span><span> </span><span>other</span><span> </span><span>reviews</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"Originalbigguy"</span><span>,</span><span> </span><span>"avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=mo"</span><span>,</span><span> </span><span>"rating"</span><span>:</span><span> </span><span>1.0</span><span>,</span><span> </span><span>"likes"</span><span>:</span><span> </span><span>0</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"April 09, 2021"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"Not free"</span><span>,</span><span> </span><span>"response"</span><span>:</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"Collectorz.com"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app."</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"April 10, 2021"</span><span> </span><span>}</span><span> </span><span>}</span><span> </span><span>]</span><span> </span><span>[</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"JazzTripp"</span><span>,</span><span> </span><span>"avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg"</span><span>,</span><span> </span><span>"rating"</span><span>:</span><span> </span><span>5.0</span><span>,</span><span> </span><span>"likes"</span><span>:</span><span> </span><span>20</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"May 06, 2022"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!"</span><span>,</span><span> </span><span>"response"</span><span>:</span><span> </span><span>null</span><span> </span><span>},</span><span> </span><span>...</span><span> </span><span>other</span><span> </span><span>reviews</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"Originalbigguy"</span><span>,</span><span> </span><span>"avatar"</span><span>:</span><span> </span><span>"https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=mo"</span><span>,</span><span> </span><span>"rating"</span><span>:</span><span> </span><span>1.0</span><span>,</span><span> </span><span>"likes"</span><span>:</span><span> </span><span>0</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"April 09, 2021"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"Not free"</span><span>,</span><span> </span><span>"response"</span><span>:</span><span> </span><span>{</span><span> </span><span>"title"</span><span>:</span><span> </span><span>"Collectorz.com"</span><span>,</span><span> </span><span>"snippet"</span><span>:</span><span> </span><span>"The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app."</span><span>,</span><span> </span><span>"date"</span><span>:</span><span> </span><span>"April 10, 2021"</span><span> </span><span>}</span><span> </span><span>}</span><span> </span><span>]</span><span> </span>[ { "title": "JazzTripp", "avatar": "https://play-lh.googleusercontent.com/a-/ACNPEu8THUUDL3yzcd0bHSDRR4OegOWLmfbFi70On0HbRg", "rating": 5.0, "likes": 20, "date": "May 06, 2022", "snippet": "This app takes a bit if getting used to at first, but the catalogue is extensive, and most bar codes and isbn numbers can be used to autofill a good chuck of a collection. I personally use this app for manga, and while its only correct about 70% of the time, its still easy to update and change as you see fit. The 'add to core' option makes me feel like im actually helping out the app, so i add data whenever i can. Keep up the good work guys!", "response": null }, ... other reviews { "title": "Originalbigguy", "avatar": "https://play-lh.googleusercontent.com/a/ALm5wu3dYTOHvlG8SUqgyTbRnjv9I49JtxgySY-RwTJU=mo", "rating": 1.0, "likes": 0, "date": "April 09, 2021", "snippet": "Not free", "response": { "title": "Collectorz.com", "snippet": "The app is never advertised as free anywhere. The app information clearly states this is a paid subscription app.", "date": "April 10, 2021" } } ]
Enter fullscreen mode Exit fullscreen mode
Links
Join us on Reddit | Twitter | YouTube
Google Play Web Scraping (3 Part Series)
1 Scrape Google Play Store App in Python
2 Scrape Google Play Search Apps in Python
3 Web Scraping All Google Play App Reviews in Python
暂无评论内容