Building Robust Web Automation with Selenium and Python

Automation of the web is now an indispensable tool in modern software development and testing. In this comprehensive Selenium Python tutorial, you’ll learn how to build a robust web automation framework capable of handling real-world scenarios. If you are interested in implementing automated testing in Python or creating complex web scraping automation solutions, this guide will give you industry-tested approaches and Selenium best practices.

Understanding Web Automation Fundamentals

Web automation is vital in modern software development, testing, and data collection. Its applications span from end-to-end testing of web applications to simplifying repetitive workflows, such as form submissions or web scraping. While Selenium WebDriver Python integration offers powerful capabilities, robust web automation is more than just writing scripts to mimic user interactions. It’s about designing workflows and frameworks that are maintainable, adaptable, and resilient to changes to the target web application.

Below are the key aspects we’ll cover throughout this tutorial:

  • Selecting appropriate locators (XPath, CSS, etc.)
  • Dynamic elements and state loading
  • Implementing retry mechanisms
  • Managing browser sessions properly
  • Maintainability structure of code

We will build a web scraping automation project for a price tracker on e-commerce websites using Books to Scrape as a demo site to demonstrate these concepts while adhering to Selenium best practices.

Prerequisites

To follow along with this tutorial, you’ll need:

The code for this tutorial is available on our github repository, feel free to clone it to follow along.

Setting Up the Development Environment

Let’s set up a proper development environment and install the necessary Python packages. First, create the project folder, and a new virtual environment by running the commands below:

<span>mkdir </span>price_tracker_automation <span>&&</span> <span>cd </span>price_tracker_automation
python3 <span>-m</span> venv <span>env source env</span>/bin/activate
<span>mkdir </span>price_tracker_automation <span>&&</span> <span>cd </span>price_tracker_automation
python3 <span>-m</span> venv <span>env source env</span>/bin/activate
mkdir price_tracker_automation && cd price_tracker_automation python3 -m venv env source env/bin/activate

Enter fullscreen mode Exit fullscreen mode

Then, create and add the following Python packages to your requirements.txt file:

selenium==4.16.0
webdriver-manager==4.0.1
python-dotenv==1.0.0
requests==2.31.0
selenium==4.16.0
webdriver-manager==4.0.1
python-dotenv==1.0.0
requests==2.31.0
selenium==4.16.0 webdriver-manager==4.0.1 python-dotenv==1.0.0 requests==2.31.0

Enter fullscreen mode Exit fullscreen mode

In the above code, we defined our core dependencies. The selenium package provides the foundation for our web automation framework, while webdriver-manager handles browser driver management automatically. The python-dotenv package is for environment configuration, and the requests package is for HTTP requests handling.

Now run the command below to install all the Python packages in your requirements.txt file by running the command below:

pip <span>install</span> <span>-r</span> requirements.txt
pip <span>install</span> <span>-r</span> requirements.txt
pip install -r requirements.txt

Enter fullscreen mode Exit fullscreen mode

Lastly, create the following folder structure for our project:

price_tracker_automation/
├── core/
│ ├── browser.py
| ├── scraper.py
│ └── element_handler.py
├── database/
│ └── db_manager.py
├── notifications/
| └── price_alert.py
├── requirements.txt
├── run.py
└── main.py
price_tracker_automation/
├── core/
│   ├── browser.py
|   ├── scraper.py
│   └── element_handler.py
├── database/
│   └── db_manager.py
├── notifications/
|   └── price_alert.py
├── requirements.txt
├── run.py
└── main.py
price_tracker_automation/ ├── core/ │ ├── browser.py | ├── scraper.py │ └── element_handler.py ├── database/ │ └── db_manager.py ├── notifications/ | └── price_alert.py ├── requirements.txt ├── run.py └── main.py

Enter fullscreen mode Exit fullscreen mode

Here we establish a modular project structure following software engineering best practices. The core directory contains our primary automation components, while database handles data persistence.

Building the Price Tracker Tool

With the project environment, dependencies, and folder structures created, let’s proceed to build the price tracker automation tool using Selenium and Python.

Implementing our Browser Management System

Let’s implement our browser management system, this is an important component for stable Selenium WebDriver Python integration. Add the code snippet below to your core/browser.py file:

<span>from</span> <span>selenium</span> <span>import</span> <span>webdriver</span>
<span>from</span> <span>selenium.webdriver.chrome.service</span> <span>import</span> <span>Service</span>
<span>from</span> <span>selenium.webdriver.support</span> <span>import</span> <span>expected_conditions</span> <span>as</span> <span>EC</span>
<span>import</span> <span>logging</span>
<span>class</span> <span>BrowserManager</span><span>:</span>
<span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>headless</span><span>=</span><span>False</span><span>):</span>
<span>self</span><span>.</span><span>options</span> <span>=</span> <span>webdriver</span><span>.</span><span>ChromeOptions</span><span>()</span>
<span>if</span> <span>headless</span><span>:</span>
<span>self</span><span>.</span><span>options</span><span>.</span><span>add_argument</span><span>(</span><span>'</span><span>--headless</span><span>'</span><span>)</span>
<span># Add additional stability options </span> <span>self</span><span>.</span><span>options</span><span>.</span><span>add_argument</span><span>(</span><span>'</span><span>--no-sandbox</span><span>'</span><span>)</span>
<span>self</span><span>.</span><span>options</span><span>.</span><span>add_argument</span><span>(</span><span>'</span><span>--disable-dev-shm-usage</span><span>'</span><span>)</span>
<span>self</span><span>.</span><span>options</span><span>.</span><span>add_argument</span><span>(</span><span>'</span><span>--disable-gpu</span><span>'</span><span>)</span>
<span>self</span><span>.</span><span>driver</span> <span>=</span> <span>None</span>
<span>self</span><span>.</span><span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>
<span>from</span> <span>selenium</span> <span>import</span> <span>webdriver</span>
<span>from</span> <span>selenium.webdriver.chrome.service</span> <span>import</span> <span>Service</span>
<span>from</span> <span>selenium.webdriver.support</span> <span>import</span> <span>expected_conditions</span> <span>as</span> <span>EC</span>
<span>import</span> <span>logging</span>

<span>class</span> <span>BrowserManager</span><span>:</span>
    <span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>headless</span><span>=</span><span>False</span><span>):</span>
        <span>self</span><span>.</span><span>options</span> <span>=</span> <span>webdriver</span><span>.</span><span>ChromeOptions</span><span>()</span>
        <span>if</span> <span>headless</span><span>:</span>
            <span>self</span><span>.</span><span>options</span><span>.</span><span>add_argument</span><span>(</span><span>'</span><span>--headless</span><span>'</span><span>)</span>

        <span># Add additional stability options </span>        <span>self</span><span>.</span><span>options</span><span>.</span><span>add_argument</span><span>(</span><span>'</span><span>--no-sandbox</span><span>'</span><span>)</span>
        <span>self</span><span>.</span><span>options</span><span>.</span><span>add_argument</span><span>(</span><span>'</span><span>--disable-dev-shm-usage</span><span>'</span><span>)</span>
        <span>self</span><span>.</span><span>options</span><span>.</span><span>add_argument</span><span>(</span><span>'</span><span>--disable-gpu</span><span>'</span><span>)</span>

        <span>self</span><span>.</span><span>driver</span> <span>=</span> <span>None</span>
        <span>self</span><span>.</span><span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>
from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.support import expected_conditions as EC import logging class BrowserManager: def __init__(self, headless=False): self.options = webdriver.ChromeOptions() if headless: self.options.add_argument('--headless') # Add additional stability options self.options.add_argument('--no-sandbox') self.options.add_argument('--disable-dev-shm-usage') self.options.add_argument('--disable-gpu') self.driver = None self.logger = logging.getLogger(__name__)

Enter fullscreen mode Exit fullscreen mode

The above code creates a BrowserManager class that handles WebDriver initialization and configuration. The class implements Selenium best practices by configuring Chrome options for stability and performance. The headless parameter allows for running tests without a visible browser window, which is crucial for CI/CD pipelines.

Now add the following methods to the BrowserManager class to implement the core browser management features:

<span>def</span> <span>start_browser</span><span>(</span><span>self</span><span>):</span>
<span>"""</span><span>Initialize and return a ChromeDriver instance</span><span>"""</span>
<span>try</span><span>:</span>
<span>service</span> <span>=</span> <span>webdriver</span><span>.</span><span>ChromeService</span><span>()</span>
<span>self</span><span>.</span><span>driver</span> <span>=</span> <span>webdriver</span><span>.</span><span>Chrome</span><span>(</span><span>service</span><span>=</span><span>service</span><span>,</span> <span>options</span><span>=</span><span>self</span><span>.</span><span>options</span><span>)</span>
<span>self</span><span>.</span><span>driver</span><span>.</span><span>implicitly_wait</span><span>(</span><span>10</span><span>)</span>
<span>return</span> <span>self</span><span>.</span><span>driver</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to start browser: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
<span>raise</span>
<span>def</span> <span>close_browser</span><span>(</span><span>self</span><span>):</span>
<span>"""</span><span>Safely close the browser</span><span>"""</span>
<span>if</span> <span>self</span><span>.</span><span>driver</span><span>:</span>
<span>self</span><span>.</span><span>driver</span><span>.</span><span>quit</span><span>()</span>
<span>self</span><span>.</span><span>driver</span> <span>=</span> <span>None</span>
    <span>def</span> <span>start_browser</span><span>(</span><span>self</span><span>):</span>
        <span>"""</span><span>Initialize and return a ChromeDriver instance</span><span>"""</span>
        <span>try</span><span>:</span>
 <span>service</span> <span>=</span> <span>webdriver</span><span>.</span><span>ChromeService</span><span>()</span>
            <span>self</span><span>.</span><span>driver</span> <span>=</span> <span>webdriver</span><span>.</span><span>Chrome</span><span>(</span><span>service</span><span>=</span><span>service</span><span>,</span> <span>options</span><span>=</span><span>self</span><span>.</span><span>options</span><span>)</span>
            <span>self</span><span>.</span><span>driver</span><span>.</span><span>implicitly_wait</span><span>(</span><span>10</span><span>)</span>
            <span>return</span> <span>self</span><span>.</span><span>driver</span>
        <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
            <span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to start browser: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
            <span>raise</span>

    <span>def</span> <span>close_browser</span><span>(</span><span>self</span><span>):</span>
        <span>"""</span><span>Safely close the browser</span><span>"""</span>
        <span>if</span> <span>self</span><span>.</span><span>driver</span><span>:</span>
            <span>self</span><span>.</span><span>driver</span><span>.</span><span>quit</span><span>()</span>
            <span>self</span><span>.</span><span>driver</span> <span>=</span> <span>None</span>
def start_browser(self): """Initialize and return a ChromeDriver instance""" try: service = webdriver.ChromeService() self.driver = webdriver.Chrome(service=service, options=self.options) self.driver.implicitly_wait(10) return self.driver except Exception as e: self.logger.error(f"Failed to start browser: {str(e)}") raise def close_browser(self): """Safely close the browser""" if self.driver: self.driver.quit() self.driver = None

Enter fullscreen mode Exit fullscreen mode

In the above code, the start_browser method utilizes webdriver-manager to automatically handle driver installation and updates, while close_browser ensures proper resource cleanup. The implementation includes an implicit wait configuration to handle dynamic page loading gracefully.

Creating an Element Handler

Next, let’s proceed to implement the element interaction system, this is important in any web automation framework because it enables us to detect and interact with elements in a reliable way while following Selenium’s best practices. Add the code snippets to your core/element_handler.py

<span>from</span> <span>selenium.webdriver.support.ui</span> <span>import</span> <span>WebDriverWait</span>
<span>from</span> <span>selenium.webdriver.support</span> <span>import</span> <span>expected_conditions</span> <span>as</span> <span>EC</span>
<span>from</span> <span>selenium.common.exceptions</span> <span>import</span> <span>TimeoutException</span><span>,</span> <span>StaleElementReferenceException</span>
<span>class</span> <span>ElementHandler</span><span>:</span>
<span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>driver</span><span>,</span> <span>timeout</span><span>=</span><span>10</span><span>):</span>
<span>self</span><span>.</span><span>driver</span> <span>=</span> <span>driver</span>
<span>self</span><span>.</span><span>timeout</span> <span>=</span> <span>timeout</span>
<span>from</span> <span>selenium.webdriver.support.ui</span> <span>import</span> <span>WebDriverWait</span>
<span>from</span> <span>selenium.webdriver.support</span> <span>import</span> <span>expected_conditions</span> <span>as</span> <span>EC</span>
<span>from</span> <span>selenium.common.exceptions</span> <span>import</span> <span>TimeoutException</span><span>,</span> <span>StaleElementReferenceException</span>

<span>class</span> <span>ElementHandler</span><span>:</span>
    <span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>driver</span><span>,</span> <span>timeout</span><span>=</span><span>10</span><span>):</span>
        <span>self</span><span>.</span><span>driver</span> <span>=</span> <span>driver</span>
        <span>self</span><span>.</span><span>timeout</span> <span>=</span> <span>timeout</span>
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.common.exceptions import TimeoutException, StaleElementReferenceException class ElementHandler: def __init__(self, driver, timeout=10): self.driver = driver self.timeout = timeout

Enter fullscreen mode Exit fullscreen mode

In the above code, we created an ElementHandler class, which encapsulates Selenium WebDriver Python interaction patterns. The class accepts a WebDriver instance and configurable timeout parameter.

Update your ElementHandler class to add core element interaction methods:

<span>def</span> <span>wait_for_element</span><span>(</span><span>self</span><span>,</span> <span>locator</span><span>,</span> <span>timeout</span><span>=</span><span>None</span><span>):</span>
<span>"""</span><span>Wait for element with retry mechanism</span><span>"""</span>
<span>timeout</span> <span>=</span> <span>timeout</span> <span>or</span> <span>self</span><span>.</span><span>timeout</span>
<span>try</span><span>:</span>
<span>element</span> <span>=</span> <span>WebDriverWait</span><span>(</span><span>self</span><span>.</span><span>driver</span><span>,</span> <span>timeout</span><span>).</span><span>until</span><span>(</span>
<span>EC</span><span>.</span><span>presence_of_element_located</span><span>(</span><span>locator</span><span>)</span>
<span>)</span>
<span>return</span> <span>element</span>
<span>except</span> <span>TimeoutException</span><span>:</span>
<span>raise</span> <span>TimeoutException</span><span>(</span><span>f</span><span>"</span><span>Element </span><span>{</span><span>locator</span><span>}</span><span> not found after </span><span>{</span><span>timeout</span><span>}</span><span> seconds</span><span>"</span><span>)</span>
<span>def</span> <span>get_text_safely</span><span>(</span><span>self</span><span>,</span> <span>locator</span><span>,</span> <span>timeout</span><span>=</span><span>None</span><span>):</span>
<span>"""</span><span>Safely get text from element with retry mechanism</span><span>"""</span>
<span>max_retries</span> <span>=</span> <span>3</span>
<span>for</span> <span>attempt</span> <span>in</span> <span>range</span><span>(</span><span>max_retries</span><span>):</span>
<span>try</span><span>:</span>
<span>element</span> <span>=</span> <span>self</span><span>.</span><span>wait_for_element</span><span>(</span><span>locator</span><span>,</span> <span>timeout</span><span>)</span>
<span>return</span> <span>element</span><span>.</span><span>text</span><span>.</span><span>strip</span><span>()</span>
<span>except</span> <span>StaleElementReferenceException</span><span>:</span>
<span>if</span> <span>attempt</span> <span>==</span> <span>max_retries</span> <span>-</span> <span>1</span><span>:</span>
<span>raise</span>
<span>continue</span>
   <span>def</span> <span>wait_for_element</span><span>(</span><span>self</span><span>,</span> <span>locator</span><span>,</span> <span>timeout</span><span>=</span><span>None</span><span>):</span>
        <span>"""</span><span>Wait for element with retry mechanism</span><span>"""</span>
 <span>timeout</span> <span>=</span> <span>timeout</span> <span>or</span> <span>self</span><span>.</span><span>timeout</span>
        <span>try</span><span>:</span>
 <span>element</span> <span>=</span> <span>WebDriverWait</span><span>(</span><span>self</span><span>.</span><span>driver</span><span>,</span> <span>timeout</span><span>).</span><span>until</span><span>(</span>
 <span>EC</span><span>.</span><span>presence_of_element_located</span><span>(</span><span>locator</span><span>)</span>
 <span>)</span>
            <span>return</span> <span>element</span>
        <span>except</span> <span>TimeoutException</span><span>:</span>
            <span>raise</span> <span>TimeoutException</span><span>(</span><span>f</span><span>"</span><span>Element </span><span>{</span><span>locator</span><span>}</span><span> not found after </span><span>{</span><span>timeout</span><span>}</span><span> seconds</span><span>"</span><span>)</span>

    <span>def</span> <span>get_text_safely</span><span>(</span><span>self</span><span>,</span> <span>locator</span><span>,</span> <span>timeout</span><span>=</span><span>None</span><span>):</span>
        <span>"""</span><span>Safely get text from element with retry mechanism</span><span>"""</span>
 <span>max_retries</span> <span>=</span> <span>3</span>
        <span>for</span> <span>attempt</span> <span>in</span> <span>range</span><span>(</span><span>max_retries</span><span>):</span>
            <span>try</span><span>:</span>
 <span>element</span> <span>=</span> <span>self</span><span>.</span><span>wait_for_element</span><span>(</span><span>locator</span><span>,</span> <span>timeout</span><span>)</span>
                <span>return</span> <span>element</span><span>.</span><span>text</span><span>.</span><span>strip</span><span>()</span>
            <span>except</span> <span>StaleElementReferenceException</span><span>:</span>
                <span>if</span> <span>attempt</span> <span>==</span> <span>max_retries</span> <span>-</span> <span>1</span><span>:</span>
                    <span>raise</span>
                <span>continue</span>
def wait_for_element(self, locator, timeout=None): """Wait for element with retry mechanism""" timeout = timeout or self.timeout try: element = WebDriverWait(self.driver, timeout).until( EC.presence_of_element_located(locator) ) return element except TimeoutException: raise TimeoutException(f"Element {locator} not found after {timeout} seconds") def get_text_safely(self, locator, timeout=None): """Safely get text from element with retry mechanism""" max_retries = 3 for attempt in range(max_retries): try: element = self.wait_for_element(locator, timeout) return element.text.strip() except StaleElementReferenceException: if attempt == max_retries - 1: raise continue

Enter fullscreen mode Exit fullscreen mode

The above methods use Selenium’s WebDriverWait and expected_conditions to detect elements so that it can also handle dynamic web pages where the elements may load asynchronously.

Add another method to implement the text extraction logic:

<span>def</span> <span>get_text_safely</span><span>(</span><span>self</span><span>,</span> <span>locator</span><span>,</span> <span>timeout</span><span>=</span><span>None</span><span>):</span>
<span>"""</span><span>Safely get a text from the element with retry mechanism</span><span>"""</span>
<span>max_retries</span> <span>=</span> <span>3</span>
<span>for</span> <span>attempt</span> <span>in</span> <span>range</span><span>(</span><span>max_retries</span><span>):</span>
<span>try</span><span>:</span>
<span>element</span> <span>=</span> <span>self</span><span>.</span><span>wait_for_element</span><span>(</span><span>locator</span><span>,</span> <span>timeout</span><span>)</span>
<span>return</span> <span>element</span><span>.</span><span>text</span><span>.</span><span>strip</span><span>()</span>
<span>except</span> <span>StaleElementReferenceException</span><span>:</span>
<span>if</span> <span>attempt</span> <span>==</span> <span>max_retries</span> <span>-</span> <span>1</span><span>:</span>
<span>raise</span>
<span>continue</span>
    <span>def</span> <span>get_text_safely</span><span>(</span><span>self</span><span>,</span> <span>locator</span><span>,</span> <span>timeout</span><span>=</span><span>None</span><span>):</span>
        <span>"""</span><span>Safely get a text from the element with retry mechanism</span><span>"""</span>
 <span>max_retries</span> <span>=</span> <span>3</span>
        <span>for</span> <span>attempt</span> <span>in</span> <span>range</span><span>(</span><span>max_retries</span><span>):</span>
            <span>try</span><span>:</span>
 <span>element</span> <span>=</span> <span>self</span><span>.</span><span>wait_for_element</span><span>(</span><span>locator</span><span>,</span> <span>timeout</span><span>)</span>
                <span>return</span> <span>element</span><span>.</span><span>text</span><span>.</span><span>strip</span><span>()</span>
            <span>except</span> <span>StaleElementReferenceException</span><span>:</span>
                <span>if</span> <span>attempt</span> <span>==</span> <span>max_retries</span> <span>-</span> <span>1</span><span>:</span>
                    <span>raise</span>
                <span>continue</span>
def get_text_safely(self, locator, timeout=None): """Safely get a text from the element with retry mechanism""" max_retries = 3 for attempt in range(max_retries): try: element = self.wait_for_element(locator, timeout) return element.text.strip() except StaleElementReferenceException: if attempt == max_retries - 1: raise continue

Enter fullscreen mode Exit fullscreen mode

The method includes retry logic to handle StaleElementReferenceException, which is a common challenge in web automation.

Implementing the Price Tracker Core

Now let’s build our main scraping functionality, incorporating automated testing Python concepts and robust error handling. Add the code snippets below to your core/scraper.py file:

<span>from</span> <span>selenium.webdriver.common.by</span> <span>import</span> <span>By</span>
<span>from</span> <span>selenium.common.exceptions</span> <span>import</span> <span>NoSuchElementException</span>
<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span>
<span>import</span> <span>logging</span>
<span>class</span> <span>BookScraper</span><span>:</span>
<span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>browser_manager</span><span>,</span> <span>element_handler</span><span>):</span>
<span>self</span><span>.</span><span>browser</span> <span>=</span> <span>browser_manager</span>
<span>self</span><span>.</span><span>handler</span> <span>=</span> <span>element_handler</span>
<span>self</span><span>.</span><span>base_url</span> <span>=</span> <span>"</span><span>http://books.toscrape.com</span><span>"</span>
<span># Define locators </span> <span>self</span><span>.</span><span>LOCATORS</span> <span>=</span> <span>{</span>
<span>'</span><span>book_title</span><span>'</span><span>:</span> <span>(</span><span>By</span><span>.</span><span>CSS_SELECTOR</span><span>,</span> <span>"</span><span>div.product_main h1</span><span>"</span><span>),</span>
<span>'</span><span>book_price</span><span>'</span><span>:</span> <span>(</span><span>By</span><span>.</span><span>CSS_SELECTOR</span><span>,</span> <span>"</span><span>p.price_color</span><span>"</span><span>),</span>
<span>'</span><span>stock_status</span><span>'</span><span>:</span> <span>(</span><span>By</span><span>.</span><span>CSS_SELECTOR</span><span>,</span> <span>"</span><span>p.availability</span><span>"</span><span>),</span>
<span>'</span><span>category</span><span>'</span><span>:</span> <span>(</span><span>By</span><span>.</span><span>CSS_SELECTOR</span><span>,</span> <span>"</span><span>ul.breadcrumb li:nth-child(3) a</span><span>"</span><span>)</span>
<span>}</span>
<span>from</span> <span>selenium.webdriver.common.by</span> <span>import</span> <span>By</span>
<span>from</span> <span>selenium.common.exceptions</span> <span>import</span> <span>NoSuchElementException</span>
<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span>
<span>import</span> <span>logging</span>

<span>class</span> <span>BookScraper</span><span>:</span>
    <span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>browser_manager</span><span>,</span> <span>element_handler</span><span>):</span>
        <span>self</span><span>.</span><span>browser</span> <span>=</span> <span>browser_manager</span>
        <span>self</span><span>.</span><span>handler</span> <span>=</span> <span>element_handler</span>
        <span>self</span><span>.</span><span>base_url</span> <span>=</span> <span>"</span><span>http://books.toscrape.com</span><span>"</span>

        <span># Define locators </span>        <span>self</span><span>.</span><span>LOCATORS</span> <span>=</span> <span>{</span>
            <span>'</span><span>book_title</span><span>'</span><span>:</span> <span>(</span><span>By</span><span>.</span><span>CSS_SELECTOR</span><span>,</span> <span>"</span><span>div.product_main h1</span><span>"</span><span>),</span>
            <span>'</span><span>book_price</span><span>'</span><span>:</span> <span>(</span><span>By</span><span>.</span><span>CSS_SELECTOR</span><span>,</span> <span>"</span><span>p.price_color</span><span>"</span><span>),</span>
            <span>'</span><span>stock_status</span><span>'</span><span>:</span> <span>(</span><span>By</span><span>.</span><span>CSS_SELECTOR</span><span>,</span> <span>"</span><span>p.availability</span><span>"</span><span>),</span>
            <span>'</span><span>category</span><span>'</span><span>:</span> <span>(</span><span>By</span><span>.</span><span>CSS_SELECTOR</span><span>,</span> <span>"</span><span>ul.breadcrumb li:nth-child(3) a</span><span>"</span><span>)</span>
 <span>}</span>
from selenium.webdriver.common.by import By from selenium.common.exceptions import NoSuchElementException from datetime import datetime import logging class BookScraper: def __init__(self, browser_manager, element_handler): self.browser = browser_manager self.handler = element_handler self.base_url = "http://books.toscrape.com" # Define locators self.LOCATORS = { 'book_title': (By.CSS_SELECTOR, "div.product_main h1"), 'book_price': (By.CSS_SELECTOR, "p.price_color"), 'stock_status': (By.CSS_SELECTOR, "p.availability"), 'category': (By.CSS_SELECTOR, "ul.breadcrumb li:nth-child(3) a") }

Enter fullscreen mode Exit fullscreen mode

In the above code, we created the BookScraper class that integrates our browser and element handling components. The class follows the Page Object Model pattern, a key concept in web automation framework design, by centralizing element locators and providing a clean API for scraping operations.

Next, update the BookScraper class to add the core product data extraction methods:

<span>def</span> <span>extract_product_data</span><span>(</span><span>self</span><span>,</span> <span>url</span><span>):</span>
<span>"""</span><span>Extract product information with robust error handling</span><span>"""</span>
<span>try</span><span>:</span>
<span>self</span><span>.</span><span>browser</span><span>.</span><span>driver</span><span>.</span><span>get</span><span>(</span><span>url</span><span>)</span>
<span># Initialize product data dictionary </span> <span>product_data</span> <span>=</span> <span>{</span>
<span>'</span><span>url</span><span>'</span><span>:</span> <span>url</span><span>,</span>
<span>'</span><span>timestamp</span><span>'</span><span>:</span> <span>datetime</span><span>.</span><span>now</span><span>().</span><span>isoformat</span><span>(),</span>
<span>'</span><span>success</span><span>'</span><span>:</span> <span>True</span>
<span>}</span>
<span># Extract each field with proper error handling </span> <span>for</span> <span>field</span><span>,</span> <span>locator</span> <span>in</span> <span>self</span><span>.</span><span>LOCATORS</span><span>.</span><span>items</span><span>():</span>
<span>try</span><span>:</span>
<span>value</span> <span>=</span> <span>self</span><span>.</span><span>handler</span><span>.</span><span>get_text_safely</span><span>(</span><span>locator</span><span>)</span>
<span>product_data</span><span>[</span><span>field</span><span>]</span> <span>=</span> <span>value</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Error extracting </span><span>{</span><span>field</span><span>}</span><span>: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
<span>product_data</span><span>[</span><span>field</span><span>]</span> <span>=</span> <span>None</span>
<span>product_data</span><span>[</span><span>'</span><span>success</span><span>'</span><span>]</span> <span>=</span> <span>False</span>
<span># Clean price data </span> <span>if</span> <span>product_data</span><span>.</span><span>get</span><span>(</span><span>'</span><span>book_price</span><span>'</span><span>):</span>
<span>product_data</span><span>[</span><span>'</span><span>book_price</span><span>'</span><span>]</span> <span>=</span> <span>self</span><span>.</span><span>clean_price</span><span>(</span>
<span>product_data</span><span>[</span><span>'</span><span>book_price</span><span>'</span><span>]</span>
<span>)</span>
<span>return</span> <span>product_data</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to scrape </span><span>{</span><span>url</span><span>}</span><span>: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
<span>return</span> <span>{</span>
<span>'</span><span>url</span><span>'</span><span>:</span> <span>url</span><span>,</span>
<span>'</span><span>timestamp</span><span>'</span><span>:</span> <span>datetime</span><span>.</span><span>now</span><span>().</span><span>isoformat</span><span>(),</span>
<span>'</span><span>success</span><span>'</span><span>:</span> <span>False</span><span>,</span>
<span>'</span><span>error</span><span>'</span><span>:</span> <span>str</span><span>(</span><span>e</span><span>)</span>
<span>}</span>
<span>def</span> <span>clean_price</span><span>(</span><span>self</span><span>,</span> <span>price_str</span><span>):</span>
<span>"""</span><span>Clean and convert price string to float</span><span>"""</span>
<span>try</span><span>:</span>
<span>return</span> <span>float</span><span>(</span><span>price_str</span><span>.</span><span>replace</span><span>(</span><span>'</span><span>£</span><span>'</span><span>,</span> <span>''</span><span>).</span><span>strip</span><span>())</span>
<span>except</span> <span>ValueError</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to parse price: </span><span>{</span><span>price_str</span><span>}</span><span>"</span><span>)</span>
<span>return</span> <span>None</span>
   <span>def</span> <span>extract_product_data</span><span>(</span><span>self</span><span>,</span> <span>url</span><span>):</span>
        <span>"""</span><span>Extract product information with robust error handling</span><span>"""</span>
        <span>try</span><span>:</span>
            <span>self</span><span>.</span><span>browser</span><span>.</span><span>driver</span><span>.</span><span>get</span><span>(</span><span>url</span><span>)</span>

            <span># Initialize product data dictionary </span> <span>product_data</span> <span>=</span> <span>{</span>
                <span>'</span><span>url</span><span>'</span><span>:</span> <span>url</span><span>,</span>
                <span>'</span><span>timestamp</span><span>'</span><span>:</span> <span>datetime</span><span>.</span><span>now</span><span>().</span><span>isoformat</span><span>(),</span>
                <span>'</span><span>success</span><span>'</span><span>:</span> <span>True</span>
 <span>}</span>
            <span># Extract each field with proper error handling </span>            <span>for</span> <span>field</span><span>,</span> <span>locator</span> <span>in</span> <span>self</span><span>.</span><span>LOCATORS</span><span>.</span><span>items</span><span>():</span>
                <span>try</span><span>:</span>
 <span>value</span> <span>=</span> <span>self</span><span>.</span><span>handler</span><span>.</span><span>get_text_safely</span><span>(</span><span>locator</span><span>)</span>
 <span>product_data</span><span>[</span><span>field</span><span>]</span> <span>=</span> <span>value</span>
                <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
                    <span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Error extracting </span><span>{</span><span>field</span><span>}</span><span>: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
 <span>product_data</span><span>[</span><span>field</span><span>]</span> <span>=</span> <span>None</span>
 <span>product_data</span><span>[</span><span>'</span><span>success</span><span>'</span><span>]</span> <span>=</span> <span>False</span>

            <span># Clean price data </span>            <span>if</span> <span>product_data</span><span>.</span><span>get</span><span>(</span><span>'</span><span>book_price</span><span>'</span><span>):</span>
 <span>product_data</span><span>[</span><span>'</span><span>book_price</span><span>'</span><span>]</span> <span>=</span> <span>self</span><span>.</span><span>clean_price</span><span>(</span>
 <span>product_data</span><span>[</span><span>'</span><span>book_price</span><span>'</span><span>]</span>
 <span>)</span>

            <span>return</span> <span>product_data</span>

        <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
            <span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to scrape </span><span>{</span><span>url</span><span>}</span><span>: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
            <span>return</span> <span>{</span>
                <span>'</span><span>url</span><span>'</span><span>:</span> <span>url</span><span>,</span>
                <span>'</span><span>timestamp</span><span>'</span><span>:</span> <span>datetime</span><span>.</span><span>now</span><span>().</span><span>isoformat</span><span>(),</span>
                <span>'</span><span>success</span><span>'</span><span>:</span> <span>False</span><span>,</span>
                <span>'</span><span>error</span><span>'</span><span>:</span> <span>str</span><span>(</span><span>e</span><span>)</span>
 <span>}</span>

    <span>def</span> <span>clean_price</span><span>(</span><span>self</span><span>,</span> <span>price_str</span><span>):</span>
        <span>"""</span><span>Clean and convert price string to float</span><span>"""</span>
        <span>try</span><span>:</span>
            <span>return</span> <span>float</span><span>(</span><span>price_str</span><span>.</span><span>replace</span><span>(</span><span>'</span><span>£</span><span>'</span><span>,</span> <span>''</span><span>).</span><span>strip</span><span>())</span>
        <span>except</span> <span>ValueError</span><span>:</span>
            <span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to parse price: </span><span>{</span><span>price_str</span><span>}</span><span>"</span><span>)</span>
            <span>return</span> <span>None</span>
def extract_product_data(self, url): """Extract product information with robust error handling""" try: self.browser.driver.get(url) # Initialize product data dictionary product_data = { 'url': url, 'timestamp': datetime.now().isoformat(), 'success': True } # Extract each field with proper error handling for field, locator in self.LOCATORS.items(): try: value = self.handler.get_text_safely(locator) product_data[field] = value except Exception as e: self.logger.error(f"Error extracting {field}: {str(e)}") product_data[field] = None product_data['success'] = False # Clean price data if product_data.get('book_price'): product_data['book_price'] = self.clean_price( product_data['book_price'] ) return product_data except Exception as e: self.logger.error(f"Failed to scrape {url}: {str(e)}") return { 'url': url, 'timestamp': datetime.now().isoformat(), 'success': False, 'error': str(e) } def clean_price(self, price_str): """Clean and convert price string to float""" try: return float(price_str.replace('£', '').strip()) except ValueError: self.logger.error(f"Failed to parse price: {price_str}") return None

Enter fullscreen mode Exit fullscreen mode

The above methods uses a structured approach to gather product information, maintaining detailed logs for debugging and monitoring.

Setting Up Database Management

Let’s implement the database layer of our web automation framework, which will handle the persistent storage of our scraped data. This component will allow us to track the price changes over time. Add the code snippets below to your database/db_manager.py:

<span>import</span> <span>sqlite3</span>
<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span>
<span>import</span> <span>logging</span>
<span>class</span> <span>DatabaseManager</span><span>:</span>
<span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>db_path</span><span>=</span><span>"</span><span>price_tracker.db</span><span>"</span><span>):</span>
<span>self</span><span>.</span><span>db_path</span> <span>=</span> <span>db_path</span>
<span>self</span><span>.</span><span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>
<span>self</span><span>.</span><span>setup_database</span><span>()</span>
<span>import</span> <span>sqlite3</span>
<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span>
<span>import</span> <span>logging</span>

<span>class</span> <span>DatabaseManager</span><span>:</span>
    <span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>db_path</span><span>=</span><span>"</span><span>price_tracker.db</span><span>"</span><span>):</span>
        <span>self</span><span>.</span><span>db_path</span> <span>=</span> <span>db_path</span>
        <span>self</span><span>.</span><span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>
        <span>self</span><span>.</span><span>setup_database</span><span>()</span>
import sqlite3 from datetime import datetime import logging class DatabaseManager: def __init__(self, db_path="price_tracker.db"): self.db_path = db_path self.logger = logging.getLogger(__name__) self.setup_database()

Enter fullscreen mode Exit fullscreen mode

In the above code, we defined our DatabaseManager class that handles all database operations. We used SQLite for simplicity and portability, to avoid having to set up and configure a database and SQLite is also ideal for our web scraping automation project since we are not storing large amounts of data.

Next, update your database/db_manager.py to add the database initialization method:

<span>def</span> <span>setup_database</span><span>(</span><span>self</span><span>):</span>
<span>"""</span><span> Initialize database with required tables</span><span>"""</span>
<span>try</span><span>:</span>
<span>with</span> <span>sqlite3</span><span>.</span><span>connect</span><span>(</span><span>self</span><span>.</span><span>db_path</span><span>)</span> <span>as</span> <span>conn</span><span>:</span>
<span>cursor</span> <span>=</span> <span>conn</span><span>.</span><span>cursor</span><span>()</span>
<span># Create products table </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> CREATE TABLE IF NOT EXISTS products ( id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT UNIQUE, title TEXT, category TEXT, first_seen TIMESTAMP, last_updated TIMESTAMP ) </span><span>'''</span><span>)</span>
<span># Create price history table </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> CREATE TABLE IF NOT EXISTS price_history ( id INTEGER PRIMARY KEY AUTOINCREMENT, product_id INTEGER, price REAL, stock_status TEXT, timestamp TIMESTAMP, FOREIGN KEY (product_id) REFERENCES products (id) ) </span><span>'''</span><span>)</span>
<span>conn</span><span>.</span><span>commit</span><span>()</span>
<span>except</span> <span>sqlite3</span><span>.</span><span>Error</span> <span>as</span> <span>e</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Database setup failed: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
<span>raise</span>
    <span>def</span> <span>setup_database</span><span>(</span><span>self</span><span>):</span>
        <span>"""</span><span> Initialize database with required tables</span><span>"""</span>
        <span>try</span><span>:</span>
            <span>with</span> <span>sqlite3</span><span>.</span><span>connect</span><span>(</span><span>self</span><span>.</span><span>db_path</span><span>)</span> <span>as</span> <span>conn</span><span>:</span>
 <span>cursor</span> <span>=</span> <span>conn</span><span>.</span><span>cursor</span><span>()</span>

                <span># Create products table </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> CREATE TABLE IF NOT EXISTS products ( id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT UNIQUE, title TEXT, category TEXT, first_seen TIMESTAMP, last_updated TIMESTAMP ) </span><span>'''</span><span>)</span>

                <span># Create price history table </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> CREATE TABLE IF NOT EXISTS price_history ( id INTEGER PRIMARY KEY AUTOINCREMENT, product_id INTEGER, price REAL, stock_status TEXT, timestamp TIMESTAMP, FOREIGN KEY (product_id) REFERENCES products (id) ) </span><span>'''</span><span>)</span>

 <span>conn</span><span>.</span><span>commit</span><span>()</span>

        <span>except</span> <span>sqlite3</span><span>.</span><span>Error</span> <span>as</span> <span>e</span><span>:</span>
            <span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Database setup failed: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
            <span>raise</span>
def setup_database(self): """ Initialize database with required tables""" try: with sqlite3.connect(self.db_path) as conn: cursor = conn.cursor() # Create products table cursor.execute(''' CREATE TABLE IF NOT EXISTS products ( id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT UNIQUE, title TEXT, category TEXT, first_seen TIMESTAMP, last_updated TIMESTAMP ) ''') # Create price history table cursor.execute(''' CREATE TABLE IF NOT EXISTS price_history ( id INTEGER PRIMARY KEY AUTOINCREMENT, product_id INTEGER, price REAL, stock_status TEXT, timestamp TIMESTAMP, FOREIGN KEY (product_id) REFERENCES products (id) ) ''') conn.commit() except sqlite3.Error as e: self.logger.error(f"Database setup failed: {str(e)}") raise

Enter fullscreen mode Exit fullscreen mode

Here we establish our database schema using SQL DDL statements, and create separate tables for products and price history, with appropriate relationships and constraints which will enable us to track price and perform historical analysis on the data we store.

Now let’s add another method to save data to the database:

<span>def</span> <span>save_product_data</span><span>(</span><span>self</span><span>,</span> <span>data</span><span>):</span>
<span>"""</span><span>Save or update product data and price history</span><span>"""</span>
<span>try</span><span>:</span>
<span>with</span> <span>sqlite3</span><span>.</span><span>connect</span><span>(</span><span>self</span><span>.</span><span>db_path</span><span>)</span> <span>as</span> <span>conn</span><span>:</span>
<span>cursor</span> <span>=</span> <span>conn</span><span>.</span><span>cursor</span><span>()</span>
<span># Insert or update product </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> INSERT INTO products (url, title, category, first_seen, last_updated) VALUES (?, ?, ?, ?, ?) ON CONFLICT(url) DO UPDATE SET title = ?, category = ?, last_updated = ? </span><span>'''</span><span>,</span> <span>(</span>
<span>data</span><span>[</span><span>'</span><span>url</span><span>'</span><span>],</span>
<span>data</span><span>[</span><span>'</span><span>book_title</span><span>'</span><span>],</span>
<span>data</span><span>[</span><span>'</span><span>category</span><span>'</span><span>],</span>
<span>datetime</span><span>.</span><span>now</span><span>(),</span>
<span>datetime</span><span>.</span><span>now</span><span>(),</span>
<span>data</span><span>[</span><span>'</span><span>book_title</span><span>'</span><span>],</span>
<span>data</span><span>[</span><span>'</span><span>category</span><span>'</span><span>],</span>
<span>datetime</span><span>.</span><span>now</span><span>()</span>
<span>))</span>
<span># Get product_id </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'</span><span>SELECT id FROM products WHERE url = ?</span><span>'</span><span>,</span> <span>(</span><span>data</span><span>[</span><span>'</span><span>url</span><span>'</span><span>],))</span>
<span>product_id</span> <span>=</span> <span>cursor</span><span>.</span><span>fetchone</span><span>()[</span><span>0</span><span>]</span>
<span># Insert price history </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> INSERT INTO price_history (product_id, price, stock_status, timestamp) VALUES (?, ?, ?, ?) </span><span>'''</span><span>,</span> <span>(</span>
<span>product_id</span><span>,</span>
<span>data</span><span>[</span><span>'</span><span>book_price</span><span>'</span><span>],</span>
<span>data</span><span>[</span><span>'</span><span>stock_status</span><span>'</span><span>],</span>
<span>data</span><span>[</span><span>'</span><span>timestamp</span><span>'</span><span>]</span>
<span>))</span>
<span>conn</span><span>.</span><span>commit</span><span>()</span>
<span>except</span> <span>sqlite3</span><span>.</span><span>Error</span> <span>as</span> <span>e</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to save product data: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
<span>raise</span>
   <span>def</span> <span>save_product_data</span><span>(</span><span>self</span><span>,</span> <span>data</span><span>):</span>
        <span>"""</span><span>Save or update product data and price history</span><span>"""</span>
        <span>try</span><span>:</span>
            <span>with</span> <span>sqlite3</span><span>.</span><span>connect</span><span>(</span><span>self</span><span>.</span><span>db_path</span><span>)</span> <span>as</span> <span>conn</span><span>:</span>
 <span>cursor</span> <span>=</span> <span>conn</span><span>.</span><span>cursor</span><span>()</span>

                <span># Insert or update product </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> INSERT INTO products (url, title, category, first_seen, last_updated) VALUES (?, ?, ?, ?, ?) ON CONFLICT(url) DO UPDATE SET title = ?, category = ?, last_updated = ? </span><span>'''</span><span>,</span> <span>(</span>
 <span>data</span><span>[</span><span>'</span><span>url</span><span>'</span><span>],</span>
 <span>data</span><span>[</span><span>'</span><span>book_title</span><span>'</span><span>],</span>
 <span>data</span><span>[</span><span>'</span><span>category</span><span>'</span><span>],</span>
 <span>datetime</span><span>.</span><span>now</span><span>(),</span>
 <span>datetime</span><span>.</span><span>now</span><span>(),</span>
 <span>data</span><span>[</span><span>'</span><span>book_title</span><span>'</span><span>],</span>
 <span>data</span><span>[</span><span>'</span><span>category</span><span>'</span><span>],</span>
 <span>datetime</span><span>.</span><span>now</span><span>()</span>
 <span>))</span>

                <span># Get product_id </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'</span><span>SELECT id FROM products WHERE url = ?</span><span>'</span><span>,</span> <span>(</span><span>data</span><span>[</span><span>'</span><span>url</span><span>'</span><span>],))</span>
 <span>product_id</span> <span>=</span> <span>cursor</span><span>.</span><span>fetchone</span><span>()[</span><span>0</span><span>]</span>

                <span># Insert price history </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> INSERT INTO price_history (product_id, price, stock_status, timestamp) VALUES (?, ?, ?, ?) </span><span>'''</span><span>,</span> <span>(</span>
 <span>product_id</span><span>,</span>
 <span>data</span><span>[</span><span>'</span><span>book_price</span><span>'</span><span>],</span>
 <span>data</span><span>[</span><span>'</span><span>stock_status</span><span>'</span><span>],</span>
 <span>data</span><span>[</span><span>'</span><span>timestamp</span><span>'</span><span>]</span>
 <span>))</span>

 <span>conn</span><span>.</span><span>commit</span><span>()</span>

        <span>except</span> <span>sqlite3</span><span>.</span><span>Error</span> <span>as</span> <span>e</span><span>:</span>
            <span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to save product data: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
            <span>raise</span>
def save_product_data(self, data): """Save or update product data and price history""" try: with sqlite3.connect(self.db_path) as conn: cursor = conn.cursor() # Insert or update product cursor.execute(''' INSERT INTO products (url, title, category, first_seen, last_updated) VALUES (?, ?, ?, ?, ?) ON CONFLICT(url) DO UPDATE SET title = ?, category = ?, last_updated = ? ''', ( data['url'], data['book_title'], data['category'], datetime.now(), datetime.now(), data['book_title'], data['category'], datetime.now() )) # Get product_id cursor.execute('SELECT id FROM products WHERE url = ?', (data['url'],)) product_id = cursor.fetchone()[0] # Insert price history cursor.execute(''' INSERT INTO price_history (product_id, price, stock_status, timestamp) VALUES (?, ?, ?, ?) ''', ( product_id, data['book_price'], data['stock_status'], data['timestamp'] )) conn.commit() except sqlite3.Error as e: self.logger.error(f"Failed to save product data: {str(e)}") raise

Enter fullscreen mode Exit fullscreen mode

In the above code, we implemented the data persistence logic using parameterized queries to prevent SQL injection. The method handles both insert and update operations using SQLite’s ON CONFLICT clause.

Main Application Integration

Let’s tie everything together with our main application class, incorporating all elements of our Selenium WebDriver Python implementation. Add the code snippets below to your main.py file:

<span>from</span> <span>core.browser</span> <span>import</span> <span>BrowserManager</span>
<span>from</span> <span>core.element_handler</span> <span>import</span> <span>ElementHandler</span>
<span>from</span> <span>core.scraper</span> <span>import</span> <span>BookScraper</span>
<span>from</span> <span>database.db_manager</span> <span>import</span> <span>DatabaseManager</span>
<span>import</span> <span>time</span>
<span>import</span> <span>logging</span>
<span>class</span> <span>PriceTracker</span><span>:</span>
<span>def</span> <span>__init__</span><span>(</span><span>self</span><span>):</span>
<span>self</span><span>.</span><span>browser_manager</span> <span>=</span> <span>BrowserManager</span><span>(</span><span>headless</span><span>=</span><span>True</span><span>)</span>
<span>self</span><span>.</span><span>driver</span> <span>=</span> <span>self</span><span>.</span><span>browser_manager</span><span>.</span><span>start_browser</span><span>()</span>
<span>self</span><span>.</span><span>element_handler</span> <span>=</span> <span>ElementHandler</span><span>(</span><span>self</span><span>.</span><span>driver</span><span>)</span>
<span>self</span><span>.</span><span>scraper</span> <span>=</span> <span>BookScraper</span><span>(</span><span>self</span><span>.</span><span>browser_manager</span><span>,</span> <span>self</span><span>.</span><span>element_handler</span><span>)</span>
<span>self</span><span>.</span><span>db_manager</span> <span>=</span> <span>DatabaseManager</span><span>()</span>
<span>self</span><span>.</span><span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>
<span>from</span> <span>core.browser</span> <span>import</span> <span>BrowserManager</span>
<span>from</span> <span>core.element_handler</span> <span>import</span> <span>ElementHandler</span>
<span>from</span> <span>core.scraper</span> <span>import</span> <span>BookScraper</span>
<span>from</span> <span>database.db_manager</span> <span>import</span> <span>DatabaseManager</span>
<span>import</span> <span>time</span>
<span>import</span> <span>logging</span>

<span>class</span> <span>PriceTracker</span><span>:</span>
    <span>def</span> <span>__init__</span><span>(</span><span>self</span><span>):</span>
        <span>self</span><span>.</span><span>browser_manager</span> <span>=</span> <span>BrowserManager</span><span>(</span><span>headless</span><span>=</span><span>True</span><span>)</span>
        <span>self</span><span>.</span><span>driver</span> <span>=</span> <span>self</span><span>.</span><span>browser_manager</span><span>.</span><span>start_browser</span><span>()</span>
        <span>self</span><span>.</span><span>element_handler</span> <span>=</span> <span>ElementHandler</span><span>(</span><span>self</span><span>.</span><span>driver</span><span>)</span>
        <span>self</span><span>.</span><span>scraper</span> <span>=</span> <span>BookScraper</span><span>(</span><span>self</span><span>.</span><span>browser_manager</span><span>,</span> <span>self</span><span>.</span><span>element_handler</span><span>)</span>
        <span>self</span><span>.</span><span>db_manager</span> <span>=</span> <span>DatabaseManager</span><span>()</span>
        <span>self</span><span>.</span><span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>
from core.browser import BrowserManager from core.element_handler import ElementHandler from core.scraper import BookScraper from database.db_manager import DatabaseManager import time import logging class PriceTracker: def __init__(self): self.browser_manager = BrowserManager(headless=True) self.driver = self.browser_manager.start_browser() self.element_handler = ElementHandler(self.driver) self.scraper = BookScraper(self.browser_manager, self.element_handler) self.db_manager = DatabaseManager() self.logger = logging.getLogger(__name__)

Enter fullscreen mode Exit fullscreen mode

In the above code, we create the main PriceTracker class that orchestrates all components of our web scraping automation solution. The PriceTracker class follows dependency injection patterns to maintain modularity and testability.

Next, update our PriceTracker class to add the core tracking methods:

<span>def</span> <span>track_product</span><span>(</span><span>self</span><span>,</span> <span>url</span><span>):</span>
<span>"""</span><span>Track a single product</span><span>'</span><span>s price</span><span>"""</span>
<span>try</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>info</span><span>(</span><span>f</span><span>"</span><span>Tracking product: </span><span>{</span><span>url</span><span>}</span><span>"</span><span>)</span>
<span>product_data</span> <span>=</span> <span>self</span><span>.</span><span>scraper</span><span>.</span><span>extract_product_data</span><span>(</span><span>url</span><span>)</span>
<span>if</span> <span>product_data</span><span>[</span><span>'</span><span>success</span><span>'</span><span>]:</span>
<span>self</span><span>.</span><span>db_manager</span><span>.</span><span>save_product_data</span><span>(</span><span>product_data</span><span>)</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>info</span><span>(</span><span>f</span><span>"</span><span>Successfully tracked product: </span><span>{</span><span>url</span><span>}</span><span>"</span><span>)</span>
<span>else</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to track product: </span><span>{</span><span>url</span><span>}</span><span>"</span><span>)</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Error tracking product </span><span>{</span><span>url</span><span>}</span><span>: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
    <span>def</span> <span>track_product</span><span>(</span><span>self</span><span>,</span> <span>url</span><span>):</span>
        <span>"""</span><span>Track a single product</span><span>'</span><span>s price</span><span>"""</span>
        <span>try</span><span>:</span>
            <span>self</span><span>.</span><span>logger</span><span>.</span><span>info</span><span>(</span><span>f</span><span>"</span><span>Tracking product: </span><span>{</span><span>url</span><span>}</span><span>"</span><span>)</span>
 <span>product_data</span> <span>=</span> <span>self</span><span>.</span><span>scraper</span><span>.</span><span>extract_product_data</span><span>(</span><span>url</span><span>)</span>

            <span>if</span> <span>product_data</span><span>[</span><span>'</span><span>success</span><span>'</span><span>]:</span>
                <span>self</span><span>.</span><span>db_manager</span><span>.</span><span>save_product_data</span><span>(</span><span>product_data</span><span>)</span>
                <span>self</span><span>.</span><span>logger</span><span>.</span><span>info</span><span>(</span><span>f</span><span>"</span><span>Successfully tracked product: </span><span>{</span><span>url</span><span>}</span><span>"</span><span>)</span>
            <span>else</span><span>:</span>
                <span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to track product: </span><span>{</span><span>url</span><span>}</span><span>"</span><span>)</span>

        <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
            <span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Error tracking product </span><span>{</span><span>url</span><span>}</span><span>: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
def track_product(self, url): """Track a single product's price""" try: self.logger.info(f"Tracking product: {url}") product_data = self.scraper.extract_product_data(url) if product_data['success']: self.db_manager.save_product_data(product_data) self.logger.info(f"Successfully tracked product: {url}") else: self.logger.error(f"Failed to track product: {url}") except Exception as e: self.logger.error(f"Error tracking product {url}: {str(e)}")

Enter fullscreen mode Exit fullscreen mode

Here we implemented the main product tracking logic that handles the web scraping and stores the scraped data.

Running the Application

Let’s create an execution script to run our automation script. Add the following code snippets to your run.py file:

<span>def</span> <span>main</span><span>():</span>
<span>parser</span> <span>=</span> <span>argparse</span><span>.</span><span>ArgumentParser</span><span>(</span><span>description</span><span>=</span><span>'</span><span>Book Price Tracker</span><span>'</span><span>)</span>
<span>parser</span><span>.</span><span>add_argument</span><span>(</span>
<span>'</span><span>--urls</span><span>'</span><span>,</span>
<span>nargs</span><span>=</span><span>'</span><span>+</span><span>'</span><span>,</span>
<span>help</span><span>=</span><span>'</span><span>URLs of books to track</span><span>'</span><span>,</span>
<span>default</span><span>=</span><span>[</span>
<span>"</span><span>http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html</span><span>"</span><span>,</span>
<span>"</span><span>http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html</span><span>"</span>
<span>]</span>
<span>)</span>
<span>args</span> <span>=</span> <span>parser</span><span>.</span><span>parse_args</span><span>()</span>
<span>setup_logging</span><span>()</span>
<span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>
<span>try</span><span>:</span>
<span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Initializing price tracker...</span><span>"</span><span>)</span>
<span>tracker</span> <span>=</span> <span>PriceTracker</span><span>()</span>
<span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Starting price tracking...</span><span>"</span><span>)</span>
<span>for</span> <span>url</span> <span>in</span> <span>args</span><span>.</span><span>urls</span><span>:</span>
<span>tracker</span><span>.</span><span>track_product</span><span>(</span><span>url</span><span>)</span>
<span>logger</span><span>.</span><span>info</span><span>(</span><span>f</span><span>"</span><span>Successfully processed: </span><span>{</span><span>url</span><span>}</span><span>"</span><span>)</span>
<span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Price tracking completed successfully</span><span>"</span><span>)</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Critical error during execution: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
<span>import</span> <span>traceback</span>
<span>logger</span><span>.</span><span>error</span><span>(</span><span>traceback</span><span>.</span><span>format_exc</span><span>())</span>
<span>def</span> <span>main</span><span>():</span>
 <span>parser</span> <span>=</span> <span>argparse</span><span>.</span><span>ArgumentParser</span><span>(</span><span>description</span><span>=</span><span>'</span><span>Book Price Tracker</span><span>'</span><span>)</span>
 <span>parser</span><span>.</span><span>add_argument</span><span>(</span>
        <span>'</span><span>--urls</span><span>'</span><span>,</span> 
        <span>nargs</span><span>=</span><span>'</span><span>+</span><span>'</span><span>,</span> 
        <span>help</span><span>=</span><span>'</span><span>URLs of books to track</span><span>'</span><span>,</span>
        <span>default</span><span>=</span><span>[</span>
            <span>"</span><span>http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html</span><span>"</span><span>,</span>
            <span>"</span><span>http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html</span><span>"</span>
 <span>]</span>
 <span>)</span>

 <span>args</span> <span>=</span> <span>parser</span><span>.</span><span>parse_args</span><span>()</span>
 <span>setup_logging</span><span>()</span>
 <span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>

    <span>try</span><span>:</span>
 <span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Initializing price tracker...</span><span>"</span><span>)</span>
 <span>tracker</span> <span>=</span> <span>PriceTracker</span><span>()</span>

 <span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Starting price tracking...</span><span>"</span><span>)</span>
        <span>for</span> <span>url</span> <span>in</span> <span>args</span><span>.</span><span>urls</span><span>:</span>
 <span>tracker</span><span>.</span><span>track_product</span><span>(</span><span>url</span><span>)</span>
 <span>logger</span><span>.</span><span>info</span><span>(</span><span>f</span><span>"</span><span>Successfully processed: </span><span>{</span><span>url</span><span>}</span><span>"</span><span>)</span>

 <span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Price tracking completed successfully</span><span>"</span><span>)</span>

    <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
 <span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Critical error during execution: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
        <span>import</span> <span>traceback</span>
 <span>logger</span><span>.</span><span>error</span><span>(</span><span>traceback</span><span>.</span><span>format_exc</span><span>())</span>
def main(): parser = argparse.ArgumentParser(description='Book Price Tracker') parser.add_argument( '--urls', nargs='+', help='URLs of books to track', default=[ "http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html", "http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html" ] ) args = parser.parse_args() setup_logging() logger = logging.getLogger(__name__) try: logger.info("Initializing price tracker...") tracker = PriceTracker() logger.info("Starting price tracking...") for url in args.urls: tracker.track_product(url) logger.info(f"Successfully processed: {url}") logger.info("Price tracking completed successfully") except Exception as e: logger.error(f"Critical error during execution: {str(e)}") import traceback logger.error(traceback.format_exc())

Enter fullscreen mode Exit fullscreen mode

Now run the following command on your terminal to run the script:

python run.py
<span># or run this if you want to specify your own urls</span>
python run.py <span>--urls</span> <span>\</span>
<span>"http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"</span> <span>\</span>
<span>"http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html"</span> <span>\</span>
<span>"http://books.toscrape.com/catalogue/soumission_998/index.html"</span>
python run.py
<span># or run this if you want to specify your own urls</span>
python run.py <span>--urls</span> <span>\</span>
<span>"http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"</span> <span>\</span>
<span>"http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html"</span> <span>\</span>
<span>"http://books.toscrape.com/catalogue/soumission_998/index.html"</span>
python run.py # or run this if you want to specify your own urls python run.py --urls \ "http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html" \ "http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html" \ "http://books.toscrape.com/catalogue/soumission_998/index.html"

Enter fullscreen mode Exit fullscreen mode

The above command will show the output on the screenshot below:

From the above script, you can see that our automation script is tracking the price for all the specified URLs.

Tracks Price Change

Our current implementation only tracks and saves product prices. After tracking prices, let’s enhance our price tracker to notify users about price changes. Add the following code snippets to your notifications/price_alert.py file:

<span>from</span> <span>email.mime.text</span> <span>import</span> <span>MIMEText</span>
<span>import</span> <span>smtplib</span>
<span>import</span> <span>logging</span>
<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span><span>,</span> <span>timedelta</span>
<span>class</span> <span>PriceAlertManager</span><span>:</span>
<span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>db_manager</span><span>):</span>
<span>self</span><span>.</span><span>db_manager</span> <span>=</span> <span>db_manager</span>
<span>self</span><span>.</span><span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>
<span>def</span> <span>check_price_changes</span><span>(</span><span>self</span><span>,</span> <span>threshold_percent</span><span>=</span><span>5</span><span>):</span>
<span>"""</span><span>Check for significant price changes</span><span>"""</span>
<span>with</span> <span>sqlite3</span><span>.</span><span>connect</span><span>(</span><span>self</span><span>.</span><span>db_manager</span><span>.</span><span>db_path</span><span>)</span> <span>as</span> <span>conn</span><span>:</span>
<span>cursor</span> <span>=</span> <span>conn</span><span>.</span><span>cursor</span><span>()</span>
<span># Get latest and previous prices for all products </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> SELECT p.title, ph1.price as current_price, ph2.price as previous_price, p.url FROM products p JOIN price_history ph1 ON p.id = ph1.product_id LEFT JOIN price_history ph2 ON p.id = ph2.product_id WHERE ph1.timestamp = ( SELECT MAX(timestamp) FROM price_history WHERE product_id = p.id ) AND ph2.timestamp = ( SELECT MAX(timestamp) FROM price_history WHERE product_id = p.id AND timestamp < ph1.timestamp ) </span><span>'''</span><span>)</span>
<span>price_changes</span> <span>=</span> <span>[]</span>
<span>for</span> <span>row</span> <span>in</span> <span>cursor</span><span>.</span><span>fetchall</span><span>():</span>
<span>title</span><span>,</span> <span>current_price</span><span>,</span> <span>previous_price</span><span>,</span> <span>url</span> <span>=</span> <span>row</span>
<span>if</span> <span>previous_price</span><span>:</span> <span># Skip first-time prices </span> <span>change_percent</span> <span>=</span> <span>((</span><span>current_price</span> <span>-</span> <span>previous_price</span><span>)</span> <span>/</span> <span>previous_price</span><span>)</span> <span>*</span> <span>100</span>
<span>if</span> <span>abs</span><span>(</span><span>change_percent</span><span>)</span> <span>>=</span> <span>threshold_percent</span><span>:</span>
<span>price_changes</span><span>.</span><span>append</span><span>({</span>
<span>'</span><span>title</span><span>'</span><span>:</span> <span>title</span><span>,</span>
<span>'</span><span>url</span><span>'</span><span>:</span> <span>url</span><span>,</span>
<span>'</span><span>old_price</span><span>'</span><span>:</span> <span>previous_price</span><span>,</span>
<span>'</span><span>new_price</span><span>'</span><span>:</span> <span>current_price</span><span>,</span>
<span>'</span><span>change_percent</span><span>'</span><span>:</span> <span>change_percent</span>
<span>})</span>
<span>return</span> <span>price_changes</span>
<span>from</span> <span>email.mime.text</span> <span>import</span> <span>MIMEText</span>
<span>import</span> <span>smtplib</span>
<span>import</span> <span>logging</span>
<span>from</span> <span>datetime</span> <span>import</span> <span>datetime</span><span>,</span> <span>timedelta</span>

<span>class</span> <span>PriceAlertManager</span><span>:</span>
    <span>def</span> <span>__init__</span><span>(</span><span>self</span><span>,</span> <span>db_manager</span><span>):</span>
        <span>self</span><span>.</span><span>db_manager</span> <span>=</span> <span>db_manager</span>
        <span>self</span><span>.</span><span>logger</span> <span>=</span> <span>logging</span><span>.</span><span>getLogger</span><span>(</span><span>__name__</span><span>)</span>

    <span>def</span> <span>check_price_changes</span><span>(</span><span>self</span><span>,</span> <span>threshold_percent</span><span>=</span><span>5</span><span>):</span>
        <span>"""</span><span>Check for significant price changes</span><span>"""</span>
        <span>with</span> <span>sqlite3</span><span>.</span><span>connect</span><span>(</span><span>self</span><span>.</span><span>db_manager</span><span>.</span><span>db_path</span><span>)</span> <span>as</span> <span>conn</span><span>:</span>
 <span>cursor</span> <span>=</span> <span>conn</span><span>.</span><span>cursor</span><span>()</span>

            <span># Get latest and previous prices for all products </span> <span>cursor</span><span>.</span><span>execute</span><span>(</span><span>'''</span><span> SELECT p.title, ph1.price as current_price, ph2.price as previous_price, p.url FROM products p JOIN price_history ph1 ON p.id = ph1.product_id LEFT JOIN price_history ph2 ON p.id = ph2.product_id WHERE ph1.timestamp = ( SELECT MAX(timestamp) FROM price_history WHERE product_id = p.id ) AND ph2.timestamp = ( SELECT MAX(timestamp) FROM price_history WHERE product_id = p.id AND timestamp < ph1.timestamp ) </span><span>'''</span><span>)</span>

 <span>price_changes</span> <span>=</span> <span>[]</span>
            <span>for</span> <span>row</span> <span>in</span> <span>cursor</span><span>.</span><span>fetchall</span><span>():</span>
 <span>title</span><span>,</span> <span>current_price</span><span>,</span> <span>previous_price</span><span>,</span> <span>url</span> <span>=</span> <span>row</span>
                <span>if</span> <span>previous_price</span><span>:</span>  <span># Skip first-time prices </span> <span>change_percent</span> <span>=</span> <span>((</span><span>current_price</span> <span>-</span> <span>previous_price</span><span>)</span> <span>/</span> <span>previous_price</span><span>)</span> <span>*</span> <span>100</span>
                    <span>if</span> <span>abs</span><span>(</span><span>change_percent</span><span>)</span> <span>>=</span> <span>threshold_percent</span><span>:</span>
 <span>price_changes</span><span>.</span><span>append</span><span>({</span>
                            <span>'</span><span>title</span><span>'</span><span>:</span> <span>title</span><span>,</span>
                            <span>'</span><span>url</span><span>'</span><span>:</span> <span>url</span><span>,</span>
                            <span>'</span><span>old_price</span><span>'</span><span>:</span> <span>previous_price</span><span>,</span>
                            <span>'</span><span>new_price</span><span>'</span><span>:</span> <span>current_price</span><span>,</span>
                            <span>'</span><span>change_percent</span><span>'</span><span>:</span> <span>change_percent</span>
 <span>})</span>

            <span>return</span> <span>price_changes</span>
from email.mime.text import MIMEText import smtplib import logging from datetime import datetime, timedelta class PriceAlertManager: def __init__(self, db_manager): self.db_manager = db_manager self.logger = logging.getLogger(__name__) def check_price_changes(self, threshold_percent=5): """Check for significant price changes""" with sqlite3.connect(self.db_manager.db_path) as conn: cursor = conn.cursor() # Get latest and previous prices for all products cursor.execute(''' SELECT p.title, ph1.price as current_price, ph2.price as previous_price, p.url FROM products p JOIN price_history ph1 ON p.id = ph1.product_id LEFT JOIN price_history ph2 ON p.id = ph2.product_id WHERE ph1.timestamp = ( SELECT MAX(timestamp) FROM price_history WHERE product_id = p.id ) AND ph2.timestamp = ( SELECT MAX(timestamp) FROM price_history WHERE product_id = p.id AND timestamp < ph1.timestamp ) ''') price_changes = [] for row in cursor.fetchall(): title, current_price, previous_price, url = row if previous_price: # Skip first-time prices change_percent = ((current_price - previous_price) / previous_price) * 100 if abs(change_percent) >= threshold_percent: price_changes.append({ 'title': title, 'url': url, 'old_price': previous_price, 'new_price': current_price, 'change_percent': change_percent }) return price_changes

Enter fullscreen mode Exit fullscreen mode

In the above code snippet, we created a PriceAlertManager class with essential dependencies. The manager takes a database manager instance as a parameter and sets up logging for tracking alert operations. The class uses complex joins to compare current and previous prices. Then we implemented a dynamic price change percentage computation and created a structured dictionary for price change information.

Next, update your PriceAlertManager class to add an email notification functionality:

<span>def</span> <span>send_price_alerts</span><span>(</span><span>self</span><span>,</span> <span>email_config</span><span>,</span> <span>price_changes</span><span>):</span>
<span>"""</span><span>Send email alerts for price changes</span><span>"""</span>
<span>if</span> <span>not</span> <span>price_changes</span><span>:</span>
<span>return</span>
<span># Create email content </span> <span>email_body</span> <span>=</span> <span>"</span><span>Price Change Alerts:</span><span>\n\n</span><span>"</span>
<span>for</span> <span>change</span> <span>in</span> <span>price_changes</span><span>:</span>
<span>email_body</span> <span>+=</span> <span>f</span><span>"""</span><span> Product: </span><span>{</span><span>change</span><span>[</span><span>'</span><span>title</span><span>'</span><span>]</span><span>}</span><span> Old Price: £</span><span>{</span><span>change</span><span>[</span><span>'</span><span>old_price</span><span>'</span><span>]</span><span>:</span><span>.</span><span>2</span><span>f</span><span>}</span><span> New Price: £</span><span>{</span><span>change</span><span>[</span><span>'</span><span>new_price</span><span>'</span><span>]</span><span>:</span><span>.</span><span>2</span><span>f</span><span>}</span><span> Change: </span><span>{</span><span>change</span><span>[</span><span>'</span><span>change_percent</span><span>'</span><span>]</span><span>:</span><span>.</span><span>1</span><span>f</span><span>}</span><span>% URL: </span><span>{</span><span>change</span><span>[</span><span>'</span><span>url</span><span>'</span><span>]</span><span>}</span><span> ------------------- </span><span>"""</span>
<span>msg</span> <span>=</span> <span>MIMEText</span><span>(</span><span>email_body</span><span>)</span>
<span>msg</span><span>[</span><span>'</span><span>Subject</span><span>'</span><span>]</span> <span>=</span> <span>f</span><span>'</span><span>Price Alert: </span><span>{</span><span>len</span><span>(</span><span>price_changes</span><span>)</span><span>}</span><span> products changed</span><span>'</span>
<span>msg</span><span>[</span><span>'</span><span>From</span><span>'</span><span>]</span> <span>=</span> <span>email_config</span><span>[</span><span>'</span><span>sender</span><span>'</span><span>]</span>
<span>msg</span><span>[</span><span>'</span><span>To</span><span>'</span><span>]</span> <span>=</span> <span>email_config</span><span>[</span><span>'</span><span>recipient</span><span>'</span><span>]</span>
<span>try</span><span>:</span>
<span>with</span> <span>smtplib</span><span>.</span><span>SMTP</span><span>(</span><span>email_config</span><span>[</span><span>'</span><span>smtp_server</span><span>'</span><span>],</span> <span>email_config</span><span>[</span><span>'</span><span>smtp_port</span><span>'</span><span>])</span> <span>as</span> <span>server</span><span>:</span>
<span>server</span><span>.</span><span>starttls</span><span>()</span>
<span>server</span><span>.</span><span>login</span><span>(</span><span>email_config</span><span>[</span><span>'</span><span>username</span><span>'</span><span>],</span> <span>email_config</span><span>[</span><span>'</span><span>password</span><span>'</span><span>])</span>
<span>server</span><span>.</span><span>send_message</span><span>(</span><span>msg</span><span>)</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>info</span><span>(</span><span>f</span><span>"</span><span>Price alerts sent for </span><span>{</span><span>len</span><span>(</span><span>price_changes</span><span>)</span><span>}</span><span> products</span><span>"</span><span>)</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to send price alerts: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
  <span>def</span> <span>send_price_alerts</span><span>(</span><span>self</span><span>,</span> <span>email_config</span><span>,</span> <span>price_changes</span><span>):</span>
        <span>"""</span><span>Send email alerts for price changes</span><span>"""</span>
        <span>if</span> <span>not</span> <span>price_changes</span><span>:</span>
            <span>return</span>

        <span># Create email content </span> <span>email_body</span> <span>=</span> <span>"</span><span>Price Change Alerts:</span><span>\n\n</span><span>"</span>
        <span>for</span> <span>change</span> <span>in</span> <span>price_changes</span><span>:</span>
 <span>email_body</span> <span>+=</span> <span>f</span><span>"""</span><span> Product: </span><span>{</span><span>change</span><span>[</span><span>'</span><span>title</span><span>'</span><span>]</span><span>}</span><span> Old Price: £</span><span>{</span><span>change</span><span>[</span><span>'</span><span>old_price</span><span>'</span><span>]</span><span>:</span><span>.</span><span>2</span><span>f</span><span>}</span><span> New Price: £</span><span>{</span><span>change</span><span>[</span><span>'</span><span>new_price</span><span>'</span><span>]</span><span>:</span><span>.</span><span>2</span><span>f</span><span>}</span><span> Change: </span><span>{</span><span>change</span><span>[</span><span>'</span><span>change_percent</span><span>'</span><span>]</span><span>:</span><span>.</span><span>1</span><span>f</span><span>}</span><span>% URL: </span><span>{</span><span>change</span><span>[</span><span>'</span><span>url</span><span>'</span><span>]</span><span>}</span><span> ------------------- </span><span>"""</span>

 <span>msg</span> <span>=</span> <span>MIMEText</span><span>(</span><span>email_body</span><span>)</span>
 <span>msg</span><span>[</span><span>'</span><span>Subject</span><span>'</span><span>]</span> <span>=</span> <span>f</span><span>'</span><span>Price Alert: </span><span>{</span><span>len</span><span>(</span><span>price_changes</span><span>)</span><span>}</span><span> products changed</span><span>'</span>
 <span>msg</span><span>[</span><span>'</span><span>From</span><span>'</span><span>]</span> <span>=</span> <span>email_config</span><span>[</span><span>'</span><span>sender</span><span>'</span><span>]</span>
 <span>msg</span><span>[</span><span>'</span><span>To</span><span>'</span><span>]</span> <span>=</span> <span>email_config</span><span>[</span><span>'</span><span>recipient</span><span>'</span><span>]</span>

        <span>try</span><span>:</span>
            <span>with</span> <span>smtplib</span><span>.</span><span>SMTP</span><span>(</span><span>email_config</span><span>[</span><span>'</span><span>smtp_server</span><span>'</span><span>],</span> <span>email_config</span><span>[</span><span>'</span><span>smtp_port</span><span>'</span><span>])</span> <span>as</span> <span>server</span><span>:</span>
 <span>server</span><span>.</span><span>starttls</span><span>()</span>
 <span>server</span><span>.</span><span>login</span><span>(</span><span>email_config</span><span>[</span><span>'</span><span>username</span><span>'</span><span>],</span> <span>email_config</span><span>[</span><span>'</span><span>password</span><span>'</span><span>])</span>
 <span>server</span><span>.</span><span>send_message</span><span>(</span><span>msg</span><span>)</span>
                <span>self</span><span>.</span><span>logger</span><span>.</span><span>info</span><span>(</span><span>f</span><span>"</span><span>Price alerts sent for </span><span>{</span><span>len</span><span>(</span><span>price_changes</span><span>)</span><span>}</span><span> products</span><span>"</span><span>)</span>
        <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
            <span>self</span><span>.</span><span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Failed to send price alerts: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
def send_price_alerts(self, email_config, price_changes): """Send email alerts for price changes""" if not price_changes: return # Create email content email_body = "Price Change Alerts:\n\n" for change in price_changes: email_body += f""" Product: {change['title']} Old Price: £{change['old_price']:.2f} New Price: £{change['new_price']:.2f} Change: {change['change_percent']:.1f}% URL: {change['url']} ------------------- """ msg = MIMEText(email_body) msg['Subject'] = f'Price Alert: {len(price_changes)} products changed' msg['From'] = email_config['sender'] msg['To'] = email_config['recipient'] try: with smtplib.SMTP(email_config['smtp_server'], email_config['smtp_port']) as server: server.starttls() server.login(email_config['username'], email_config['password']) server.send_message(msg) self.logger.info(f"Price alerts sent for {len(price_changes)} products") except Exception as e: self.logger.error(f"Failed to send price alerts: {str(e)}")

Enter fullscreen mode Exit fullscreen mode

Here, we created an email notification using Python’s email and SMTP libraries. The implementation uses the MIMEText class to create properly formatted email messages. The email body is dynamically generated using f-strings, incorporating detailed price change information with precise currency formatting.

Now let’s modify our run script to include price alerts:

<span>#... </span><span>from</span> <span>notifications.price_alert</span> <span>import</span> <span>PriceAlertManager</span>
<span>#... </span><span>def</span> <span>main</span><span>():</span>
<span>#... </span> <span>tracker</span> <span>=</span> <span>PriceTracker</span><span>()</span>
<span>alert_manager</span> <span>=</span> <span>PriceAlertManager</span><span>(</span><span>tracker</span><span>.</span><span>db_manager</span><span>)</span>
<span>try</span><span>:</span>
<span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Starting price tracking...</span><span>"</span><span>)</span>
<span>for</span> <span>url</span> <span>in</span> <span>args</span><span>.</span><span>urls</span><span>:</span>
<span>tracker</span><span>.</span><span>track_product</span><span>(</span><span>url</span><span>)</span>
<span>time</span><span>.</span><span>sleep</span><span>(</span><span>2</span><span>)</span> <span># Be nice to the server </span>
<span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Price tracking completed successfully</span><span>"</span><span>)</span>
<span># Check for price changes </span> <span>price_changes</span> <span>=</span> <span>alert_manager</span><span>.</span><span>check_price_changes</span><span>(</span><span>threshold_percent</span><span>=</span><span>5</span><span>)</span>
<span>if</span> <span>price_changes</span><span>:</span>
<span># Email configuration </span> <span>email_config</span> <span>=</span> <span>{</span>
<span>'</span><span>sender</span><span>'</span><span>:</span> <span>'</span><span>your-email@example.com</span><span>'</span><span>,</span>
<span>'</span><span>recipient</span><span>'</span><span>:</span> <span>'</span><span>recipient@example.com</span><span>'</span><span>,</span>
<span>'</span><span>smtp_server</span><span>'</span><span>:</span> <span>'</span><span>smtp.gmail.com</span><span>'</span><span>,</span>
<span>'</span><span>smtp_port</span><span>'</span><span>:</span> <span>587</span><span>,</span>
<span>'</span><span>username</span><span>'</span><span>:</span> <span>'</span><span>your-email@example.com</span><span>'</span><span>,</span>
<span>'</span><span>password</span><span>'</span><span>:</span> <span>'</span><span>your-app-password</span><span>'</span>
<span>}</span>
<span># Send alerts </span> <span>alert_manager</span><span>.</span><span>send_price_alerts</span><span>(</span><span>email_config</span><span>,</span> <span>price_changes</span><span>)</span>
<span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
<span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Error during price tracking: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
<span>finally</span><span>:</span>
<span>tracker</span><span>.</span><span>cleanup</span><span>()</span>
<span>#... </span><span>from</span> <span>notifications.price_alert</span> <span>import</span> <span>PriceAlertManager</span>

<span>#... </span><span>def</span> <span>main</span><span>():</span>
    <span>#... </span> <span>tracker</span> <span>=</span> <span>PriceTracker</span><span>()</span>
 <span>alert_manager</span> <span>=</span> <span>PriceAlertManager</span><span>(</span><span>tracker</span><span>.</span><span>db_manager</span><span>)</span>

    <span>try</span><span>:</span>
 <span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Starting price tracking...</span><span>"</span><span>)</span>
        <span>for</span> <span>url</span> <span>in</span> <span>args</span><span>.</span><span>urls</span><span>:</span>
 <span>tracker</span><span>.</span><span>track_product</span><span>(</span><span>url</span><span>)</span>
 <span>time</span><span>.</span><span>sleep</span><span>(</span><span>2</span><span>)</span>  <span># Be nice to the server </span>
 <span>logger</span><span>.</span><span>info</span><span>(</span><span>"</span><span>Price tracking completed successfully</span><span>"</span><span>)</span>
        <span># Check for price changes </span> <span>price_changes</span> <span>=</span> <span>alert_manager</span><span>.</span><span>check_price_changes</span><span>(</span><span>threshold_percent</span><span>=</span><span>5</span><span>)</span>
        <span>if</span> <span>price_changes</span><span>:</span>
            <span># Email configuration </span> <span>email_config</span> <span>=</span> <span>{</span>
                <span>'</span><span>sender</span><span>'</span><span>:</span> <span>'</span><span>your-email@example.com</span><span>'</span><span>,</span>
                <span>'</span><span>recipient</span><span>'</span><span>:</span> <span>'</span><span>recipient@example.com</span><span>'</span><span>,</span>
                <span>'</span><span>smtp_server</span><span>'</span><span>:</span> <span>'</span><span>smtp.gmail.com</span><span>'</span><span>,</span>
                <span>'</span><span>smtp_port</span><span>'</span><span>:</span> <span>587</span><span>,</span>
                <span>'</span><span>username</span><span>'</span><span>:</span> <span>'</span><span>your-email@example.com</span><span>'</span><span>,</span>
                <span>'</span><span>password</span><span>'</span><span>:</span> <span>'</span><span>your-app-password</span><span>'</span>
 <span>}</span>

            <span># Send alerts </span> <span>alert_manager</span><span>.</span><span>send_price_alerts</span><span>(</span><span>email_config</span><span>,</span> <span>price_changes</span><span>)</span>

    <span>except</span> <span>Exception</span> <span>as</span> <span>e</span><span>:</span>
 <span>logger</span><span>.</span><span>error</span><span>(</span><span>f</span><span>"</span><span>Error during price tracking: </span><span>{</span><span>str</span><span>(</span><span>e</span><span>)</span><span>}</span><span>"</span><span>)</span>
    <span>finally</span><span>:</span>
 <span>tracker</span><span>.</span><span>cleanup</span><span>()</span>
#... from notifications.price_alert import PriceAlertManager #... def main(): #... tracker = PriceTracker() alert_manager = PriceAlertManager(tracker.db_manager) try: logger.info("Starting price tracking...") for url in args.urls: tracker.track_product(url) time.sleep(2) # Be nice to the server logger.info("Price tracking completed successfully") # Check for price changes price_changes = alert_manager.check_price_changes(threshold_percent=5) if price_changes: # Email configuration email_config = { 'sender': 'your-email@example.com', 'recipient': 'recipient@example.com', 'smtp_server': 'smtp.gmail.com', 'smtp_port': 587, 'username': 'your-email@example.com', 'password': 'your-app-password' } # Send alerts alert_manager.send_price_alerts(email_config, price_changes) except Exception as e: logger.error(f"Error during price tracking: {str(e)}") finally: tracker.cleanup()

Enter fullscreen mode Exit fullscreen mode

Now if you run the script again, it will track the product prices and alert you of the products whose prices have changed like in the screenshot below:

Perhaps you can run this script in a cron job to track the prodcuts prices and alert you in real-time of the price changes without having to manually run it everytime.
Eg. 0 */6 * * * python run.py --urls \
"http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html" \
"http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html" \
"http://books.toscrape.com/catalogue/soumission_998/index.html"

Conclusion

Throughout this tutorial, you’ve learned how to build a robust web automation tool using Selenium and Python. We started by understanding the web automation fundamentals, then we set up a development eviroment for the Price Traker tool we built for the demonstrations in this tutorial. Then we went futher to build the Price tracker application that tracks prices of products and alerts users of the price changes. Now that you have this knowledge, what tool would you be building next. Let me know in the comments section. Happy coding!

原文链接:Building Robust Web Automation with Selenium and Python

© 版权声明
THE END
喜欢就支持一下吧
点赞12 分享
Life is like a cup of tea. It won't be bitter for a lifetime but for a short while anyway.
人生就像一杯茶,不会苦一辈子,但总会苦一阵子
评论 抢沙发

请登录后发表评论

    暂无评论内容