Creating a Datalake in AWS for NBA analytics!

Hey there thanks for stopping by! Today, I will be teaching you how to automatically spin up an AWS S3 bucket, create a Glue database and fetch NBA data to store in the bucket. From there we will utilize AWS Athena to query the data. Stick around to learn more on AWS S3, Glue, Athena and Python!

Purpose

The purpose of this article is to walk you through how to leverage AWS tools and technologies to fetch, store and visualize data. The technologies that will be used in this article include: GitHub, AWS (S3, Athena, Glue) and Python.

Resources

Discord
YouTube Video from Alicia
My GitHub Repo

Prerequisites

VS Code (or your favorite text editor)
AWS Free Tier Account
GitHub Account
SportsData.io Account and API key
Knowledge of Git, Linux commands, APIs and AWS

Architecture

Setup

1. Clone the repo

```
git clone https://github.com/asciikeyboard/nba-datalake.git
cd nba-datalake
```

Enter fullscreen mode Exit fullscreen mode

2. Log into the AWS console and launch CloudShell

3. Create the Python file

In the CLI, type nano setup_nba_data_lake.py
Copy and paste the contents from the setup_nba_data_lake.py file in the src folder in GitHub into the CloudShell terminal
Press ^X to exit, press Y to save the file, press enter to confirm the file name

4. Create the environment variables file

In the CLI, type nano .env
Paste the following info into the file and update with your API key

SPORTS_DATA_API_KEY=your_sportsdata_api_key
NBA_ENDPOINT=https://api.sportsdata.io/v3/nba/scores/json/Players

Enter fullscreen mode Exit fullscreen mode

Press ^X to exit, press Y to save the file, press enter to confirm the file name

5. Install dotenv in CloudShell and run the script

Type pip install python-dotenv and hit enter
Once dotenv is installed, run the python script python3 setup_nba_data_lake.py. You should see the following output if successful:

6. Verify resources are in AWS
In IT there is a phrase that goes like this “trust but verify”. That’s what we are going to do now.

Search for S3 in the AWS console and check to see if your bucket is there
Next, search for Athena in the AWS console and run a query against the newly populated data:

SELECT FirstName, LastName, Position, Team
FROM nba_players
WHERE Position = 'PG';

Enter fullscreen mode Exit fullscreen mode

The result should look like this:

Recap

Now, you have successfully created a Python script that automagically creates an S3 bucket and Glue database with NBA data. You then learned how to leverage Amazon Athena to visualize that data. You are well on your way to becoming a cloudy!

原文链接：Creating a Datalake in AWS for NBA analytics!

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END

Creating a Datalake in AWS for NBA analytics!

Purpose

Resources

Prerequisites

Architecture

Setup

Recap

请登录后发表评论