Pandas is one of the most widely used libraries in Python for data manipulation and analysis. Whether you are working on small datasets or large-scale data, Pandas makes it easy to clean, transform, and analyze data quickly and efficiently. In this blog, I will walk through an example of how I use Pandas to fetch and manipulate data and then transform the raw information into a graph for data visualization.
To start, you don’t need to worry about local installation, an easy way to jump into using Pandas is through Google Colab, a cloud-based Jupyter notebook environment that includes Pandas by default.
Importing Necessary Libraries
<span>import</span> <span>requests</span> <span># importing the requests library in order to do fetch </span><span>import</span> <span>pandas</span> <span>as</span> <span>pd</span> <span># importing the pandas library for data visualization, data manipulation, and analysis </span><span>import</span> <span>requests</span> <span># importing the requests library in order to do fetch </span><span>import</span> <span>pandas</span> <span>as</span> <span>pd</span> <span># importing the pandas library for data visualization, data manipulation, and analysis </span>import requests # importing the requests library in order to do fetch import pandas as pd # importing the pandas library for data visualization, data manipulation, and analysis
Enter fullscreen mode Exit fullscreen mode
Before working with APIs and Pandas, we need the import the required libraries.
-
requests
: This is a popular Python library used to send HTTP requests. We will be using it to fetch data from an API -
pandas
: Pandas is the library we are showcasing today! It provides powerful tools to clean, transform, and visualize structured datasets.
By importing the two libraries, we will be using them to retrieve data from the Kraken API and process it through a structured Pandas DataFrame.
Creating the fetch function – Kraken API & Pandas
<span>def</span> <span>get_historic_price</span><span>(</span><span>symbol</span><span>,</span> <span>after</span><span>=</span><span>'</span><span>2024-12-01</span><span>'</span><span>):</span> <span># fetch function - parameters are the ticker symbol, and date, </span><span>url</span> <span>=</span> <span>'</span><span>https://api.kraken.com/0/public/OHLC</span><span>'</span> <span># api </span> <span>pair</span> <span>=</span> <span>f</span><span>"</span><span>{</span><span>symbol</span><span>.</span><span>upper</span><span>()</span><span>}</span><span>USD</span><span>"</span><span># XBTUSD when symbol='xbt' - used to dynamically constructor the symbol in uppercase and USD. f stands for f-string, formatted string literal </span><span>resp</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>url</span><span>,</span> <span>params</span><span>=</span><span>{</span> <span># this is the fetch request </span> <span>"</span><span>pair</span><span>"</span><span>:</span> <span>pair</span><span>,</span> <span># the pair that was created </span> <span>'</span><span>interval</span><span>'</span><span>:</span> <span>60</span><span>,</span> <span># time interval for data in minutes (60 for 1 hr) </span> <span>'</span><span>since</span><span>'</span><span>:</span> <span>str</span><span>(</span><span>int</span><span>(</span><span>pd</span><span>.</span><span>Timestamp</span><span>(</span><span>after</span><span>).</span><span>timestamp</span><span>()))</span><span># pd.Timestamp() - Pandas function that converts a date into timestamp object, after is a string - </span> <span># .timestamp() - Pandas function that converts a Timestamp to UNIX format, which is a float </span> <span># 1733011200.0 (This is the UNIX timestamp for December 1, 2024, 00:00:00 UTC.) </span> <span># int() - Python function that converts data type to int </span> <span># str() - Python function that converts datattype to string </span> <span>})</span><span>resp</span><span>.</span><span>raise_for_status</span><span>()</span> <span># error if response is a failure </span><span>data</span> <span>=</span> <span>resp</span><span>.</span><span>json</span><span>()</span> <span># convert JSON response </span><span>results_key</span> <span>=</span> <span>[</span><span>k</span> <span>for</span> <span>k</span> <span>in</span> <span>data</span><span>[</span><span>'</span><span>result</span><span>'</span><span>].</span><span>keys</span><span>()</span> <span>if</span> <span>k</span> <span>!=</span> <span>'</span><span>last</span><span>'</span><span>][</span><span>0</span><span>]</span> <span># iterate over the data.result to get the keys and filter out "last" </span> <span>results</span> <span>=</span> <span>[</span><span>(</span><span>close_time</span><span>,</span> <span>float</span><span>(</span><span>open</span><span>),</span> <span>float</span><span>(</span><span>high</span><span>),</span> <span>float</span><span>(</span><span>low</span><span>),</span> <span>float</span><span>(</span><span>close</span><span>),</span> <span>float</span><span>(</span><span>volume</span><span>))</span> <span># formatting </span> <span>for </span><span>(</span><span>close_time</span><span>,</span> <span>open</span><span>,</span> <span>high</span><span>,</span> <span>low</span><span>,</span> <span>close</span><span>,</span> <span>vwap</span><span>,</span> <span>volume</span><span>,</span> <span>count</span><span>)</span> <span># looping over each row </span> <span>in</span> <span>data</span><span>[</span><span>'</span><span>result</span><span>'</span><span>][</span><span>results_key</span><span>]</span> <span># source of data </span> <span>]</span><span># Before </span> <span># [1680105600, '27000.0', '27500.0', '26000.0', '26500.0', '26750.0', '120.5', 300] </span> <span># </span> <span># After </span> <span># (1680105600, 27000.0, 27500.0, 26000.0, 26500.0, 120.5) </span><span>df</span> <span>=</span> <span>pd</span><span>.</span><span>DataFrame</span><span>(</span><span>results</span><span>,</span> <span>columns</span><span>=</span><span>[</span> <span># Pandas function - creating a DataFrame with params of result (list of tuples) & assign column names </span> <span>'</span><span>CloseTime</span><span>'</span><span>,</span> <span>'</span><span>OpenPrice</span><span>'</span><span>,</span> <span>'</span><span>HighPrice</span><span>'</span><span>,</span> <span>'</span><span>LowPrice</span><span>'</span><span>,</span> <span>'</span><span>ClosePrice</span><span>'</span><span>,</span> <span>'</span><span>Volume</span><span>'</span><span>])</span><span>df</span><span>[</span><span>'</span><span>CloseTime</span><span>'</span><span>]</span> <span>=</span> <span>pd</span><span>.</span><span>to_datetime</span><span>(</span><span>df</span><span>[</span><span>'</span><span>CloseTime</span><span>'</span><span>],</span> <span>unit</span><span>=</span><span>'</span><span>s</span><span>'</span><span>)</span> <span># Pandas function to convert from UNIX back to readable time format - on CloseTime </span> <span>df</span><span>.</span><span>set_index</span><span>(</span><span>'</span><span>CloseTime</span><span>'</span><span>,</span> <span>inplace</span><span>=</span><span>True</span><span>)</span> <span># setting the index of DataFrame as CloseTime </span> <span>return</span> <span>df</span> <span># return DataFrame </span><span>def</span> <span>get_historic_price</span><span>(</span><span>symbol</span><span>,</span> <span>after</span><span>=</span><span>'</span><span>2024-12-01</span><span>'</span><span>):</span> <span># fetch function - parameters are the ticker symbol, and date, </span> <span>url</span> <span>=</span> <span>'</span><span>https://api.kraken.com/0/public/OHLC</span><span>'</span> <span># api </span> <span>pair</span> <span>=</span> <span>f</span><span>"</span><span>{</span><span>symbol</span><span>.</span><span>upper</span><span>()</span><span>}</span><span>USD</span><span>"</span> <span># XBTUSD when symbol='xbt' - used to dynamically constructor the symbol in uppercase and USD. f stands for f-string, formatted string literal </span> <span>resp</span> <span>=</span> <span>requests</span><span>.</span><span>get</span><span>(</span><span>url</span><span>,</span> <span>params</span><span>=</span><span>{</span> <span># this is the fetch request </span> <span>"</span><span>pair</span><span>"</span><span>:</span> <span>pair</span><span>,</span> <span># the pair that was created </span> <span>'</span><span>interval</span><span>'</span><span>:</span> <span>60</span><span>,</span> <span># time interval for data in minutes (60 for 1 hr) </span> <span>'</span><span>since</span><span>'</span><span>:</span> <span>str</span><span>(</span><span>int</span><span>(</span><span>pd</span><span>.</span><span>Timestamp</span><span>(</span><span>after</span><span>).</span><span>timestamp</span><span>()))</span> <span># pd.Timestamp() - Pandas function that converts a date into timestamp object, after is a string - </span> <span># .timestamp() - Pandas function that converts a Timestamp to UNIX format, which is a float </span> <span># 1733011200.0 (This is the UNIX timestamp for December 1, 2024, 00:00:00 UTC.) </span> <span># int() - Python function that converts data type to int </span> <span># str() - Python function that converts datattype to string </span> <span>})</span> <span>resp</span><span>.</span><span>raise_for_status</span><span>()</span> <span># error if response is a failure </span> <span>data</span> <span>=</span> <span>resp</span><span>.</span><span>json</span><span>()</span> <span># convert JSON response </span> <span>results_key</span> <span>=</span> <span>[</span><span>k</span> <span>for</span> <span>k</span> <span>in</span> <span>data</span><span>[</span><span>'</span><span>result</span><span>'</span><span>].</span><span>keys</span><span>()</span> <span>if</span> <span>k</span> <span>!=</span> <span>'</span><span>last</span><span>'</span><span>][</span><span>0</span><span>]</span> <span># iterate over the data.result to get the keys and filter out "last" </span> <span>results</span> <span>=</span> <span>[</span> <span>(</span><span>close_time</span><span>,</span> <span>float</span><span>(</span><span>open</span><span>),</span> <span>float</span><span>(</span><span>high</span><span>),</span> <span>float</span><span>(</span><span>low</span><span>),</span> <span>float</span><span>(</span><span>close</span><span>),</span> <span>float</span><span>(</span><span>volume</span><span>))</span> <span># formatting </span> <span>for </span><span>(</span><span>close_time</span><span>,</span> <span>open</span><span>,</span> <span>high</span><span>,</span> <span>low</span><span>,</span> <span>close</span><span>,</span> <span>vwap</span><span>,</span> <span>volume</span><span>,</span> <span>count</span><span>)</span> <span># looping over each row </span> <span>in</span> <span>data</span><span>[</span><span>'</span><span>result</span><span>'</span><span>][</span><span>results_key</span><span>]</span> <span># source of data </span> <span>]</span> <span># Before </span> <span># [1680105600, '27000.0', '27500.0', '26000.0', '26500.0', '26750.0', '120.5', 300] </span> <span># </span> <span># After </span> <span># (1680105600, 27000.0, 27500.0, 26000.0, 26500.0, 120.5) </span> <span>df</span> <span>=</span> <span>pd</span><span>.</span><span>DataFrame</span><span>(</span><span>results</span><span>,</span> <span>columns</span><span>=</span><span>[</span> <span># Pandas function - creating a DataFrame with params of result (list of tuples) & assign column names </span> <span>'</span><span>CloseTime</span><span>'</span><span>,</span> <span>'</span><span>OpenPrice</span><span>'</span><span>,</span> <span>'</span><span>HighPrice</span><span>'</span><span>,</span> <span>'</span><span>LowPrice</span><span>'</span><span>,</span> <span>'</span><span>ClosePrice</span><span>'</span><span>,</span> <span>'</span><span>Volume</span><span>'</span> <span>])</span> <span>df</span><span>[</span><span>'</span><span>CloseTime</span><span>'</span><span>]</span> <span>=</span> <span>pd</span><span>.</span><span>to_datetime</span><span>(</span><span>df</span><span>[</span><span>'</span><span>CloseTime</span><span>'</span><span>],</span> <span>unit</span><span>=</span><span>'</span><span>s</span><span>'</span><span>)</span> <span># Pandas function to convert from UNIX back to readable time format - on CloseTime </span> <span>df</span><span>.</span><span>set_index</span><span>(</span><span>'</span><span>CloseTime</span><span>'</span><span>,</span> <span>inplace</span><span>=</span><span>True</span><span>)</span> <span># setting the index of DataFrame as CloseTime </span> <span>return</span> <span>df</span> <span># return DataFrame </span>def get_historic_price(symbol, after='2024-12-01'): # fetch function - parameters are the ticker symbol, and date, url = 'https://api.kraken.com/0/public/OHLC' # api pair = f"{symbol.upper()}USD" # XBTUSD when symbol='xbt' - used to dynamically constructor the symbol in uppercase and USD. f stands for f-string, formatted string literal resp = requests.get(url, params={ # this is the fetch request "pair": pair, # the pair that was created 'interval': 60, # time interval for data in minutes (60 for 1 hr) 'since': str(int(pd.Timestamp(after).timestamp())) # pd.Timestamp() - Pandas function that converts a date into timestamp object, after is a string - # .timestamp() - Pandas function that converts a Timestamp to UNIX format, which is a float # 1733011200.0 (This is the UNIX timestamp for December 1, 2024, 00:00:00 UTC.) # int() - Python function that converts data type to int # str() - Python function that converts datattype to string }) resp.raise_for_status() # error if response is a failure data = resp.json() # convert JSON response results_key = [k for k in data['result'].keys() if k != 'last'][0] # iterate over the data.result to get the keys and filter out "last" results = [ (close_time, float(open), float(high), float(low), float(close), float(volume)) # formatting for (close_time, open, high, low, close, vwap, volume, count) # looping over each row in data['result'][results_key] # source of data ] # Before # [1680105600, '27000.0', '27500.0', '26000.0', '26500.0', '26750.0', '120.5', 300] # # After # (1680105600, 27000.0, 27500.0, 26000.0, 26500.0, 120.5) df = pd.DataFrame(results, columns=[ # Pandas function - creating a DataFrame with params of result (list of tuples) & assign column names 'CloseTime', 'OpenPrice', 'HighPrice', 'LowPrice', 'ClosePrice', 'Volume' ]) df['CloseTime'] = pd.to_datetime(df['CloseTime'], unit='s') # Pandas function to convert from UNIX back to readable time format - on CloseTime df.set_index('CloseTime', inplace=True) # setting the index of DataFrame as CloseTime return df # return DataFrame
Enter fullscreen mode Exit fullscreen mode
To demonstrate Panda’s integration with external data sources. let’s explore a function that fetches historical cryptocurrency prices from the Kraken exchange API.
The get_historic_price
function retrieves hourly price data with the parameters of the cryptocurrency symbol and the date.
How The Function Works
-
The API Request: A call is made to Kraken’s OHLC(Open-High-Low-Close) API. The
get_historic_price
params are manipulated to be the options/params of the request. -
Data Processing: The response to the request has the relevant price data extracted and converted into floats.
-
Create DataFrame: The organized data is transformed into a DataFrame through the method of
.DataFrame()
The fieldCloseTime
is converted and set and the DataFrame is returned.
Creating Fetch Parameters
Now that the fetch function is created, it requires the parameters of the symbol and data.
When working with time-series data, you may need to fetch data from a specific time range. Pandas provide convenient functions to manipulate dates and times.
last_week = (pd.Timestamp.now() - pd.offsets.Day(7)) # Pandas function to get the current date and time - 7 dayslast_week = (pd.Timestamp.now() - pd.offsets.Day(7)) # Pandas function to get the current date and time - 7 dayslast_week = (pd.Timestamp.now() - pd.offsets.Day(7)) # Pandas function to get the current date and time - 7 days
Enter fullscreen mode Exit fullscreen mode
Pandas methods
-
.Timestamp.now()
– Gets the current date and time. -
.offsets.Days(7)
– Represents an offset of 7 days
The variable last_week
reflects the current date minus seven days, making it exactly one week ago.
Calling the fetch
After the last_week
variable is set, we can call the fetch on the historical Bitcoin (BTC). The returned data is formatted into a Pandas DataFrame, making it easier for future analysis and visualization.
btc = get_historic_price('btc', after=last_week) # running fetchbtc = get_historic_price('btc', after=last_week) # running fetchbtc = get_historic_price('btc', after=last_week) # running fetch
Enter fullscreen mode Exit fullscreen mode
Data Inspection & Exploration
Before diving into data analysis, it’s crucial to inspect and explore the dataset. Pandas provide powerful functions to quickly understand the structure and contents of a data frame.
Viewing the First Few Rows
btc.head() # Displays the first few rows of the DataFramebtc.head() # Displays the first few rows of the DataFramebtc.head() # Displays the first few rows of the DataFrame
Enter fullscreen mode Exit fullscreen mode
-
btc
is the variable storing the DataFrame. -
.head()
is a Pandas function that returns the first five rows by default, providing a quick snapshot of the dataset.
Why Use It?
Ensures the data has loaded correctly.
Provides an overview of the column names and structure.
Helps preview data without printing the entire DataFrame.
Getting a Summary of the Data
btc.info() # Displays a concise summary of the DataFramebtc.info() # Displays a concise summary of the DataFramebtc.info() # Displays a concise summary of the DataFrame
Enter fullscreen mode Exit fullscreen mode
-
.info()
is a Pandas function that provides a high-level overview of the dataset, including column names, non-null counts, and data types.
Why Use It?
Understands the dataset’s structure and column types.
Identifies missing values that may need handling.
Confirms data types for proper analysis and visualization.
By using these simple Pandas functions, you can quickly inspect your dataset and prepare it for deeper analysis. Stay tuned as we dive into data cleaning and transformation in the next sections!
Data Visualization
Lastly, we can visualize how Bitcoin’s closing price has changed over time through data visualization.
btc['ClosePrice'].plot(figsize=(15, 7)) # Pandas function to plot the ClosePrice column of the btc Data Frame. FigSize refers to size of figure in inches, 15 width, 7 heightbtc['ClosePrice'].plot(figsize=(15, 7)) # Pandas function to plot the ClosePrice column of the btc Data Frame. FigSize refers to size of figure in inches, 15 width, 7 heightbtc['ClosePrice'].plot(figsize=(15, 7)) # Pandas function to plot the ClosePrice column of the btc Data Frame. FigSize refers to size of figure in inches, 15 width, 7 height
Enter fullscreen mode Exit fullscreen mode
-
.plot()
is a Pandas function that generates a line plot for a DataFrame column. -
btc['ClosePrice'] selects the
ClosePrice` column, which contains the Bitcoin’s closing prices over time. -
figsize(15,7)
specifies the figure size in inches (15 inches wide, and 7 inches tall)
Why Use It?
Quickly see trends, spikes, or dips in Bitcoin prices.
Help analyze historical price movements.
Conclusion
Pandas is an incredibly powerful tool for data manipulation, making it easy to fetch, clean, and visualize data with just a few lines of code. In this blog, we explored how to:
- Retrieve historical cryptocurrency prices from the Kraken API
- Transform raw JSON data into a structured Pandas DataFrame
- Inspect and analyze datasets using built-in Pandas functions
- Create a simple visualization to track price trends over time
By leveraging Pandas, you can efficiently work with large datasets and gain insights quickly. Whether you’re analyzing financial data, working with APIs, or building data-driven applications, Pandas provides the flexibility and ease of use needed to streamline your workflow.
If you’re interested in further exploration, try experimenting with different cryptocurrencies, timeframes, or visualization techniques. Data analysis is all about iteration and discovery—keep exploring!
原文链接:Learning Pandas, A Powerful Library For Data Visualization, Data manipulation, and Analysis
暂无评论内容