Web Scraping CoinsMarketCap with Python: Selenium - 拾光赋-拾光赋

Web Scraping CoinsMarketCap with Python: Selenium

3年前发布

02915

When it comes to web scraping in Python, people usually have two choices:

bs4 + requests
Selenium (the so called webdriver!)

Often, it suffices with approach one (beautifulsoup), and one can scrape the majority of websites by adding a header. However, for some websites that are equipped with strong anti-scraping, selenium is a must in your toolkit.

Today, we are going to look at an example of scraping historical price of bitcoin at coinmarketcap.

We need the historical data of bitcoin, but instead of manually copy paste, can we automate this process? Wouldn’t it be nice to have a scraper, so that each time we run it, it just scrapes everything we want?

Sure, it would be very nice! But how do we do this?

First, import necessary packages

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
import calendar
from pprint import pprint

Enter fullscreen mode Exit fullscreen mode

Next, our target page is https://coinmarketcap.com/currencies/bitcoin/historical-data/, or to make it more general, it is of the format https://coinmarketcap.com/currencies/{exchange_name}/historical-data/, where exchange_name could be bitcoin, ethereum etc.

We then open the url by webdriver

url: str = f"https://coinmarketcap.com/currencies/{exchange_name}/historical-data/"
driver = webdriver.Firefox()
driver.get(url)

Enter fullscreen mode Exit fullscreen mode

Next, we need to manually inspect the page (using development tool), and see how can we select specific element.

So, what we are interested lies in a table. The table has a parent class with class “history”. That should be pretty much enough for specifying the elements we want.

elem = driver.find_element(By.CSS_SELECTOR, ".history tbody tr")

Enter fullscreen mode Exit fullscreen mode

This selects the top most row of the table. To get the inside text of the elem, we just use elem.text.

What left is just getting target information with playing strings. Very straightforward.

Below is a fully working code:


from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
import calendar
from pprint import pprint


def get_latest_price(exchange_name="bitcoin"):
    exchange_name = exchange_name.lower()
    url: str = f"https://coinmarketcap.com/currencies/{exchange_name}/historical-data/"
    driver = webdriver.Firefox()
    driver.get(url)
    # wait the page to load     time.sleep(2)
    # get the latest date     elem = driver.find_element(By.CSS_SELECTOR, ".history tbody tr")
    res = elem.text
    res = res.split(" $")
    date = res[0]
    open_price = res[1]
    high_price = res[2]
    low_price = res[3]
    close_price = res[4]
    driver.close()
    return {
        "exchange_name": exchange_name,
        "url": url,
        "date": date,
        "open_price": open_price,
        "high_price": high_price,
        "low_price": low_price,
        "close_price": close_price,
    }

Enter fullscreen mode Exit fullscreen mode

Let’s try it out:

  python3 scraper_exchange.py 
{'close_price': '17,781.32',
 'date': 'Dec 13, 2022',
 'exchange_name': 'bitcoin',
 'high_price': '17,930.09',
 'low_price': '17,111.76',
 'open_price': '17,206.44',
 'url': 'https://coinmarketcap.com/currencies/bitcoin/historical-data/'}

Enter fullscreen mode Exit fullscreen mode

Yeah! We do get the latest real price.

原文链接：Web Scraping CoinsMarketCap with Python: Selenium

© 版权声明

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END

Python（EN）
# python

喜欢就支持一下吧

相关推荐

评论抢沙发

请登录后发表评论

暂无评论内容