Any computer uses data all the time. Sometimes thats in databases, sometimes on the web, sometimes from sensory input and sometimes office data like excel or csv.
So you probably know you can easily parse a csv file with Pandas. But did you know you can quite easily create plots directly from the csv data?
Data set
A csv data set is simply data. It could come from an office suite like GSheets or Open Office. You can save a file a csv, comma separated value. As the name defines, every value is separated by a comma.
Any data set will work, but the example below uses this csv dataset.
This data set is about movies.
For every movie it saves these values:
- Rank
- Title
- Genre
- Description
- Director
- Actors
- Year
- Runtime (Minutes)
- Rating
- Votes
- Revenue (Millions)
- Metascore
So that’s a lot of information. It’s a small data set of 1000 records.
Pandas
We first load the pandas module, matplotlib for plotting and numpy for number crunching. Then uses matplotlib to plot the data. Load the csv data and create the figure.
#!/usr/bin/python3
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
movie = pd.read_csv("IMDB-Movie-Data.csv")
movie["Rating"].mean()
movie["Rating"].plot(kind="hist", figsize=(20, 8))
plt.figure(figsize=(20, 8), dpi=80)
plt.hist(movie["Rating"], 20)
plt.xticks(np.linspace(movie["Rating"].min(), movie["Rating"].max(), 21))
plt.grid(linestyle="--", alpha=0.5)
plt.show()
Enter fullscreen mode Exit fullscreen mode
So that shows you the movie rating data. Mind you, there are a lot of records in the csv file and pandas does is instantly.
Related links:
暂无评论内容