Introduction
Target.com is one of America’s largest e-commerce and shopping marketplaces. It allows consumers to shop online and in-store for everything from groceries and essentials to clothing and electronics. As of September 2024, according to data from SimilarWeb, Target.com attracts monthly web traffic of more than 166 million.
The Target.com website offers customer reviews, dynamic pricing information, product comparison, and product ratings, among others. It is a valuable source of data for analysts, marketing teams, businesses, or researchers who want to either track product trends, monitor competitor prices, or analyze customer sentiments through reviews.
In this article, you will learn how to:
- Set up and install Python, Selenium, and Beautiful Soup for web scraping
- Scrape product reviews and ratings from Target.com using Python
- Use ScraperAPI to bypass Target.com’s anti-scraping mechanisms effectively
- Implement proxies to avoid IP bans and improve scraping performance
By the end of this article, you will learn how to collect product reviews and ratings from Target.com using Python, Selenium, and ScraperAPI without getting blocked. You will also learn how to use your scraped data for sentiment analysis.
If you are excited as I am writing this tutorial, let’s dive right in.
TL;DR: Scraping Target Product Reviews [Full Code]
For those in a hurry, here’s the complete code snippet we’ll build on this tutorial:
import os
import time
from bs4 import BeautifulSoup
from dotenv import load_dotenv
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from seleniumwire import webdriver
from webdriver_manager.chrome import ChromeDriverManager
# Load environment variables load_dotenv()
def target_com_scraper():
""" SCRAPER SETTINGS - API_KEY: Your ScraperAPI key. Get your API Key ==> https://www.scraperapi.com/?fp_ref=eunit """
API_KEY = os.getenv("API_KEY", "yourapikey")
# ScraperAPI proxy settings (with HTTP and HTTPS variants) scraper_api_proxies = {
'proxy': {
'http': f'http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001',
'https': f'http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001',
'no_proxy': 'localhost,127.0.0.1'
}
}
# URLs to scrape url_list = [
"https://www.target.com/p/enclosed-cat-litter-box-xl-up-up/-/A-90310047?preselect=87059440#lnk=sametab",
]
# Store scraped data scraped_data = []
# Setup Selenium options with proxy options = Options()
# options.add_argument("--headless") # Uncomment for headless mode options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-extensions")
options.add_argument("--disable-in-process-stack-traces")
options.add_argument("--window-size=1920,1080")
options.add_argument("--log-level=3")
options.add_argument("--disable-logging")
options.add_argument("--start-maximized")
# Initialize Selenium WebDriver driver = webdriver.Chrome(service=ChromeService(
ChromeDriverManager().install()), options=options, seleniumwire_options=scraper_api_proxies)
def scroll_down_page(distance=100, delay=0.2):
""" Scroll down the page gradually until the end. Args: - distance: Number of pixels to scroll by in each step. - delay: Time (in seconds) to wait between scrolls. """
total_height = driver.execute_script(
"return document.body.scrollHeight")
scrolled_height = 0
while scrolled_height < total_height:
# Scroll down by 'distance' pixels driver.execute_script(f"window.scrollBy(0, {distance});")
scrolled_height += distance
time.sleep(delay) # Pause between scrolls
# Update the total page height after scrolling total_height = driver.execute_script(
"return document.body.scrollHeight")
print("Finished scrolling.")
try:
for url in url_list:
# Use Selenium to load the page driver.get(url)
time.sleep(5) # Give the page time to load
# Scroll down the page scroll_down_page()
# Extract single elements with Selenium def extract_element_text(selector, description):
try:
# Wait for the element and extract text element = WebDriverWait(driver, 5).until(
EC.visibility_of_element_located(
(By.CSS_SELECTOR, selector))
)
text = element.text.strip()
return text if text else None # Return None if the text is empty except TimeoutException:
print(f"Timeout: Could not find {description}. Setting to None.")
return None
except NoSuchElementException:
print(f"Element not found: {description}. Setting to None.")
return None
# Extract single elements reviews_data = {}
reviews_data["secondary_rating"] = extract_element_text("div[data-test='secondary-rating']",
"secondary_rating")
reviews_data["rating_count"] = extract_element_text(
"div[data-test='rating-count']", "rating_count")
reviews_data["rating_histogram"] = extract_element_text("div[data-test='rating-histogram']",
"rating_histogram")
reviews_data["percent_recommended"] = extract_element_text("div[data-test='percent-recommended']",
"percent_recommended")
reviews_data["total_recommendations"] = extract_element_text("div[data-test='total-recommendations']",
"total_recommendations")
# Extract reviews from 'reviews-list' scraped_reviews = []
# Use Beautiful Soup to extract other content soup = BeautifulSoup(driver.page_source, 'html.parser')
# Select all reviews in the list using BeautifulSoup reviews_list = soup.select("div[data-test='reviews-list'] > div")
for review in reviews_list:
# Create a dictionary to store each review's data ratings = {}
# Extract title title_element = review.select_one(
"h4[data-test='review-card--title']")
ratings['title'] = title_element.text.strip(
) if title_element else None
# Extract rating rating_element = review.select_one("span[data-test='ratings']")
ratings['rating'] = rating_element.text.strip(
) if rating_element else None
# Extract time time_element = review.select_one(
"span[data-test='review-card--reviewTime']")
ratings['time'] = time_element.text.strip(
) if time_element else None
# Extract review text text_element = review.select_one(
"div[data-test='review-card--text']")
ratings['text'] = text_element.text.strip(
) if text_element else None
# Append each review to the list of reviews scraped_reviews.append(ratings)
# Append the list of reviews to the main product data reviews_data["reviews"] = scraped_reviews
# Append the overall data to the scraped_data list scraped_data.append(reviews_data)
# Output the scraped data print(f"Scraped data: {scraped_data}")
except Exception as e:
print(f"Error: {e}")
finally:
# Ensure driver quits after scraping driver.quit()
if __name__ == "__main__":
target_com_scraper()
Enter fullscreen mode Exit fullscreen mode
Check out the complete code on GitHub: https://github.com/Eunit99/target_com_scraper. Want to understand each line of code? Let’s build the web scraper from scratch together!
How to Scrape Target.com Reviews with Python and ScraperAPI
In an earlier article, we covered everything you need to know to scrape Target.com product data. However, in this article, I will focus on walking you through how to scrape Target.com for product ratings and reviews with Python and ScraperAPI.
Prerequisites
To follow this tutorial and get started with scraping Target.com, you’ll need to do a few things first.
1. Have an Account with ScraperAPI
Start with a free account on ScraperAPI. ScraperAPI allows you to start collecting data from millions of web sources without complex and expensive workarounds with our easy-to-use API for web scraping.
ScraperAPI unlocks even the toughest sites, reduces infrastructure and development costs, allows you to deploy web scrapers faster, and also gives you free 1,000 API credits to try things out first, and lots more.
2. Text Editor or IDE
Use a code editor like Visual Studio Code. Other options include Sublime Text or PyCharm.
3. Project Requirements and Virtual Environment Setup
Before starting with scraping Target.com reviews, ensure you have the following:
- Python installed on your machine (version 3.10 or newer)
-
pip
(Python package installer)
It’s best practice to use a virtual environment for Python projects to manage dependencies and avoid conflicts.
To create a virtual environment, run this command in your terminal:
python3 -m venv env
Enter fullscreen mode Exit fullscreen mode
4. Activating the Virtual Environment
Activate the virtual environment based on your operating system:
# On Unix or MacOS (bash shell):
/path/to/venv/bin/activate
# On Unix or MacOS (csh shell):
/path/to/venv/bin/activate.csh
# On Unix or MacOS (fish shell):
/path/to/venv/bin/activate.fish
# On Windows (command prompt):
\path\to\venv\Scripts\activate.bat
# On Windows (PowerShell):
\path\to\venv\Scripts\Activate.ps1
Enter fullscreen mode Exit fullscreen mode
Some IDEs can automatically activate the virtual environment.
5. Have a basic understanding of CSS selectors and Navigating Browser DevTools
To effectively follow along with this article, it’s essential to have a basic understanding of CSS selectors. CSS selectors are used to target specific HTML elements on a webpage, which allows you to extract the information you need.
Also, being comfortable with browser DevTools is crucial for inspecting and identifying the structure of web pages.
Project Setup
Having satisfied the above prerequisites, it is time to set up your project. Start by creating a folder that will contain the source code of the Target.com scraper. In this case, I will name my folder python-target-dot-com-scraper
.
Run the following commands to create a folder named python-target-dot-com-scraper
:
mkdir python-target-dot-com-scraper
Enter fullscreen mode Exit fullscreen mode
Enter the folder and create a new Python main.py
file by running these commands:
cd python-target-dot-com-scraper && touch main.py
Enter fullscreen mode Exit fullscreen mode
Create a requirements.txt
file by running the following command:
touch requirements.txt
Enter fullscreen mode Exit fullscreen mode
For this article, I will use the Selenium and Beautiful Soup, and the Webdriver Manager for Python libraries to build the web scraper. Selenium will handle browser automation, and the Beautiful Soup library will extract data from the HTML content of the Target.com website. At the same time, the Webdriver Manager for Python provides a way to manage drivers for different browsers automatically.
Add the following lines to your requirements.txt
file to specify the necessary packages:
selenium~=4.25.0
bs4~=0.0.2
python-dotenv~=1.0.1
webdriver_manager
selenium-wire
blinker==1.7.0
python-dotenv==1.0.1
Enter fullscreen mode Exit fullscreen mode
To install the packages, run the following command:
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode
Extract Target.com Product Reviews with Selenium
In this section, I will walk you through a step-by-step guide on getting product ratings and reviews from a product page like this one from Target.com.
I will focus on the reviews and ratings from these sections of the website highlighted in this screenshot below:
Before delving further, you need to understand the HTML structure and identify the DOM selector associated with the HTML tag wrapping the information we want to extract. In this next section, I will walk you through using Chrome DevTools to understand Target.com’s site structure.
Using Chrome DevTools to Understand Target.com’s Site Structure
Open Chrome DevTools by pressing F12
or right-clicking anywhere on the page and choosing Inspect
. Inspecting the page from the URL above reveals the following:
From the above pictures, here are all the DOM selectors the web scraper will target to extract the information:
Information | DOM selector | Value |
---|---|---|
Product ratings | ||
Rating value | div[data-test='rating-value'] |
4.7 |
Rating count | div[data-test='rating-count'] |
683 star ratings |
Secondary rating | div[data-test='secondary-rating'] |
683 star ratings |
Rating histogram | div[data-test='rating-histogram'] |
5 stars 85%4 stars 8%3 stars 3%2 stars 1%1 star 2% |
Percent recommended | div[data-test='percent-recommended'] |
89% would recommend |
Total recommendations | div[data-test='total-recommendations'] |
125 recommendations |
Product reviews | ||
Reviews list | div[data-test='reviews-list'] |
Returns children elements corresponding to individual product review |
Review card title | h4[data-test='review-card--title'] |
Perfect litter box for cats |
Ratings | span[data-test='ratings'] |
4.7 out of 5 stars with 683 reviews |
Review time | span[data-test='review-card--reviewTime'] |
23 days ago |
Review card text | div[data-test='review-card--text'] |
My cats love it. Doesn’t take up much space either |
Building Your Target Reviews Scraper
Now that we have outlined all the requirements and have located the different elements we are interested in on the Target.com product review page. We will move to the next step which entails importing the necessary modules:
1. Importing Selenium and Other Modules
import os
import time
from bs4 import BeautifulSoup
from dotenv import load_dotenv
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from seleniumwire import webdriver
from webdriver_manager.chrome import ChromeDriverManager
Enter fullscreen mode Exit fullscreen mode
In this code, each module serves a specific purpose for building our web scraper:
-
os
handles environment variables like API keys. -
time
introduces delays during page loading. -
dotenv
loads API keys from.env
files. -
selenium
enables browser automation and interaction. -
webdriver_manager
automatically installs ChromeDriver. -
BeautifulSoup
parses HTML for data extraction. -
seleniumwire
manages proxies for scraping without IP bans.
2. Setting Up the Web Driver
In this step, you will initialize Selenium’s Chrome WebDriver and configure important browser options. These options include disabling unnecessary features to boost performance, setting window size, and managing logs. You will instantiate the WebDriver using webdriver.Chrome()
to control the browser throughout the scraping process.
# Setup Selenium options with proxy options = Options()
# options.add_argument("--headless") # Uncomment for headless mode options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
options.add_argument("--disable-extensions")
options.add_argument("--disable-in-process-stack-traces")
options.add_argument("--window-size=1920,1080")
options.add_argument("--log-level=3")
options.add_argument("--disable-logging")
options.add_argument("--start-maximized")
# Initialize Selenium WebDriver driver = webdriver.Chrome(service=ChromeService(
ChromeDriverManager().install()), options=options)
Enter fullscreen mode Exit fullscreen mode
Create Scroll-to-bottom Function
In this section, we create a function to scroll through the entire page. The Target.com website loads additional content (such as reviews) dynamically as the user scrolls down.
def scroll_down_page(distance=100, delay=0.2):
""" Scroll down the page gradually until the end. Args: - distance: Number of pixels to scroll by in each step. - delay: Time (in seconds) to wait between scrolls. """
total_height = driver.execute_script(
"return document.body.scrollHeight")
scrolled_height = 0
while scrolled_height < total_height:
# Scroll down by 'distance' pixels driver.execute_script(f"window.scrollBy(0, {distance});")
scrolled_height += distance
time.sleep(delay) # Pause between scrolls
# Update the total page height after scrolling total_height = driver.execute_script(
"return document.body.scrollHeight")
print("Finished scrolling")
Enter fullscreen mode Exit fullscreen mode
The scroll_down_page()
function gradually scrolls the web page by a set number of pixels (distance
) with a short pause (delay
) between each scroll. It first calculates the total height of the page and scrolls down until reaching the bottom. As it scrolls, the total page height is updated dynamically to accommodate new content that may load during the process.
Combining Selenium with BeautifulSoup
In this section, we combine the strengths of Selenium and BeautifulSoup to create an efficient and reliable web scraping setup. While Selenium is used to interact with dynamic content like loading pages and handling JavaScript-rendered elements, BeautifulSoup is more effective at parsing and extracting static HTML elements. We first use Selenium to navigate the webpage and wait for specific elements, like product ratings and review counts, to load. These elements are extracted with Selenium’s WebDriverWait
function, which ensures the data is visible before capturing it. However, handling individual reviews through Selenium alone can become complex and inefficient.
Using BeautifulSoup, we simplify the process of looping through multiple reviews on the page. Once Selenium has fully loaded the page, BeautifulSoup parses the HTML content to extract reviews efficiently. Using BeautifulSoup’s select()
and select_one()
methods, we can navigate the page structure and gather the title
, rating
, time
, and text for each review
. This approach allows for cleaner, more structured scraping of repeated elements (like lists of reviews) and offers greater flexibility in handling the HTML, compared to managing everything through Selenium alone.
# Extract single elements with Selenium def extract_element_text(selector, description):
try:
# Wait for the element and extract text element = WebDriverWait(driver, 5).until(
EC.visibility_of_element_located(
(By.CSS_SELECTOR, selector))
)
text = element.text.strip()
return text if text else None # Return None if the text is empty except TimeoutException:
print(f"Timeout: Could not find {description}. Setting to None.")
return None
except NoSuchElementException:
print(f"Element not found: {description}. Setting to None.")
return None
# Extract single elements reviews_data = {}
reviews_data["secondary_rating"] = extract_element_text("div[data-test='secondary-rating']",
"secondary_rating")
reviews_data["rating_count"] = extract_element_text(
"div[data-test='rating-count']", "rating_count")
reviews_data["rating_histogram"] = extract_element_text("div[data-test='rating-histogram']",
"rating_histogram")
reviews_data["percent_recommended"] = extract_element_text("div[data-test='percent-recommended']",
"percent_recommended")
reviews_data["total_recommendations"] = extract_element_text("div[data-test='total-recommendations']",
"total_recommendations")
# Extract reviews from 'reviews-list' scraped_reviews = []
# Use Beautiful Soup to extract other content soup = BeautifulSoup(driver.page_source, 'html.parser')
# Select all reviews in the list using BeautifulSoup reviews_list = soup.select("div[data-test='reviews-list'] > div")
for review in reviews_list:
# Create a dictionary to store each review's data ratings = {}
# Extract title title_element = review.select_one(
"h4[data-test='review-card--title']")
ratings['title'] = title_element.text.strip(
) if title_element else None
# Extract rating rating_element = review.select_one("span[data-test='ratings']")
ratings['rating'] = rating_element.text.strip(
) if rating_element else None
# Extract time time_element = review.select_one(
"span[data-test='review-card--reviewTime']")
ratings['time'] = time_element.text.strip(
) if time_element else None
# Extract review text text_element = review.select_one(
"div[data-test='review-card--text']")
ratings['text'] = text_element.text.strip(
) if text_element else None
# Append each review to the list of reviews scraped_reviews.append(ratings)
# Append the list of reviews to the main product data reviews_data["reviews"] = scraped_reviews
Enter fullscreen mode Exit fullscreen mode
Using Proxies in Python Selenium: A Complex Interaction with Headless Browsers
When scraping complex websites, especially those with robust anti-bot measures like Target.com, challenges like IP bans, rate limits, or access restrictions often arise. Using Selenium for such tasks gets intricate, especially when deploying a headless browser. Headless browsers allow interaction without a GUI, but managing proxies manually in this environment becomes challenging. You have to configure proxy settings, rotate IPs, and handle other interactions like JavaScript rendering, making scraping slower and prone to failure.
In contrast, ScraperAPI streamlines this process significantly by managing proxies automatically. Rather than dealing with manual configurations in Selenium, ScraperAPI’s proxy mode distributes requests across multiple IP addresses, ensuring smoother scraping without worrying about IP bans, rate limits, or geographic restrictions. This becomes particularly useful when working with headless browsers, where handling dynamic content and complex site interactions demands additional coding.
Setting Up ScraperAPI with Selenium
Integrating ScraperAPI’s proxy mode with Selenium, is simplified by using Selenium Wire, a tool that allows easy proxy configuration. Here’s a quick setup:
- Sign Up for ScraperAPI: Create an account and retrieve your API key.
- Install Selenium Wire: Replace standard Selenium with Selenium Wire by running
pip install selenium-wire
. - Configure Proxy: Use ScraperAPI’s proxy pool in your WebDriver settings to manage IP rotations effortlessly.
Once integrated, this configuration enables smoother interactions with dynamic pages, auto-rotating IP addresses, and bypassing rate limits without the manual hassle of managing proxies in a headless browser environment.
The below snippet demonstrates how to configure ScraperAPI’s proxy in Python:
""" SCRAPER SETTINGS - API_KEY: Your ScraperAPI key. Get your API Key ==> https://www.scraperapi.com/?fp_ref=eunit """
API_KEY = os.getenv("API_KEY", "yourapikey")
# ScraperAPI proxy settings (with HTTP and HTTPS variants) scraper_api_proxies = {
'proxy': {
'http': f'http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001',
'https': f'http://scraperapi:{API_KEY}@proxy-server.scraperapi.com:8001',
'no_proxy': 'localhost,127.0.0.1'
}
}
Enter fullscreen mode Exit fullscreen mode
With this setup, requests sent to the ScraperAPI proxy server are redirected to Target.com website, keeping your real IP hidden and providing a robust defense against Target.com website anti-scraping mechanisms. The proxy can also be customized by including parameters like render=true
for JavaScript rendering or specifying a country_code
for geolocation.
Scraped Review Data from Target.com
The JSON code below is a sample of the response using the Target Reviews Scraper:
[ { "secondary_rating": "quality\n4.5 out of 5", "rating_count": "687 star ratings", "rating_histogram": "5 stars\n85%\n4 stars\n8%\n3 stars\n3%\n2 stars\n1%\n1 star\n2%", "percent_recommended": "89% would recommend", "total_recommendations": "128 recommendations", "reviews": [ { "title": "Up & Up Enclosed litter box", "rating": "5 out of 5 stars", "time": "6 days ago", "text": "Great size. The color is great. Plenty of space for my bigger cats." }, { "title": "Affordable and well made", "rating": "5 out of 5 stars", "time": "7 days ago", "text": "Good to use with or without lid" } ] } ]
Enter fullscreen mode Exit fullscreen mode
How to Use Our Cloud Target.com Reviews Scraper
If you want to get your Target.com reviews quickly without setting up your environment, knowing how to code, or setting up proxies, you can use our Target Scraper API to get the data you need for free. The Target Scraper API is hosted on the Apify platform and is ready to use with no setup required.
Head over to Apify and click on “Try for free” to get started now.
Using Target Reviews for Sentiment Analysis
Now that you have your Target.com reviews and ratings data, it is time to make sense of this data. These reviews and ratings data can provide valuable insights into customers’ opinions about a particular product, or service. By analyzing these reviews, you can identify common praises and complaints, gauge customer satisfaction, predict future behavior, and transform these reviews into actionable insights.
As a marketing professional or business owner seeking ways to understand your primary audience better, and improve your marketing and product strategies. Below are some ways you can transform this data into actionable insights to optimize marketing efforts, improve product strategies, and boost customer engagement:
- Refining Product Offerings: Identify common customer complaints or praises to fine-tune product features.
- Improving Customer Service: Detect negative reviews early to address issues and maintain customer satisfaction.
- Optimizing Marketing Campaigns: Use insights from positive feedback to craft personalized, targeted campaigns.
By using ScraperAPI to gather large-scale review data at scale, you can automate and scale sentiment analysis, enabling better decision-making and growth.
FAQs About Scraping Target Product Reviews
Is it legal to scrape Target.com product pages?
Yes, it is legal to scape Target.com for publicly available information, such as product ratings and reviews. But it’s important to remember that this public information might still include personal details.
We wrote a blog post on the legal aspects of web scraping and ethical considerations. You can learn more there.
Does Target.com block scrapers?
Yes, Target.com implements various anti-scraping measures to block automated scrapers. These include IP blocking, rate-limiting, and CAPTCHA challenges, all designed to detect and stop excessive, automated requests from scrapers or bots.
How do you avoid getting blocked by Target.com?
To avoid getting blocked by Target.com, you should slow down the rate of requests, rotate user agents, use CAPTCHA-solving techniques, and avoid making repetitive or high-frequency requests. Combining these methods with proxies can help reduce the likelihood of detection.
Also, consider using dedicated scrapers like the Target Scraper API, or Scraping API to bypass these Target.com limitations.
Do I need to use proxies to scrape Target.com?
Yes, using proxies is essential to scrape Target.com effectively. Proxies help distribute requests across multiple IP addresses, minimizing the chance of getting blocked. ScraperAPI proxies hide your IP, making it more difficult for anti-scraping systems to detect your activity.
Wrapping Up
In this article, you learned how to build Target.com ratings and reviews scraper using Python, Selenium, and use ScraperAPI to bypass Target.com’s anti-scraping mechanisms effectively, and to avoid IP bans and improve scraping performance.
With this tool, you can collect valuable customer feedback efficiently and reliably.
Once you’ve gathered this data, the next step is to use sentiment analysis to uncover key insights. By analyzing customer reviews, you as a business can identify product strengths, address pain points, and optimize your marketing strategies to meet your customer needs better.
By using Target Scraper API for large-scale data collection, you can continuously monitor reviews and stay ahead in understanding customer sentiment, allowing you to refine product development and create more targeted marketing campaigns.
Try ScraperAPI now for seamless large-scale data extraction or use our Cloud Target.com Reviews Scraper!
For more tutorials and great contents, please follow me on Twitter (X) @eunit99
暂无评论内容