Recently I’ve found a data set in Kaggle which is composed with the nutrition facts of every item in McDonald’s Menu (Dataset).
I know for a while that the consumption of sugar in our life is way upon the one recommended by the health care organizations, so I wanted to analyzed how much sugar where in the McDonald menu’s items and which of them don’t have any kind of added sugar.
For this I’ve used a Jupyter Notebook with the following libraries: plotly and pandas. Here I’m going to explain the different steps I’ve followed, but the complete notebook could be checked in my Github Repository.
The information is in a cvs. First let’s load the information to see how it is structured:
menu = pd.read_csv('./menu.csv')
menu.head(10)
Enter fullscreen mode Exit fullscreen mode
With this we could see the information of the dataset. This means, the columns and the rows.
Ok, the one I’m interested in is sugar, so I’m going to create a new pandas data frame composed by the column with the item’s name and the amount of sugar, and Aldo I’m going to order them in an increasing order:
df_sugars = pd.DataFrame(columns=('Item','Sugars'))
df_sugars['Item'] = menu['Item']
df_sugars['Sugars'] = menu['Sugars']
print("Let's sort them by the amount of sugar they have in a ascending order: ")
df_sugars = df_sugars.sort_values('Sugars', ascending=[True])
print(df_sugars.head(10))
Enter fullscreen mode Exit fullscreen mode
So now that I have this, I want to check which are the menu items that don’t have any amount of sugar:
print("Number of items in the menu: "+str(len(menu.index)))
print("Number of items without sugar in the menu: "+str(len(df_sugars.loc[df_sugars['Sugars'] == 0])))
print(df_sugars.loc[df_sugars['Sugars'] == 0])
Enter fullscreen mode Exit fullscreen mode
And I obtain the following result:
Number of items in the menu: 260
Number of items without sugar in the menu: 25
Item Sugars
145 Coffee (Small) 0
99 Kids French Fries 0
96 Small French Fries 0
81 Chicken McNuggets (20 piece) 0
114 Diet Coke (Small) 0
115 Diet Coke (Medium) 0
116 Diet Coke (Large) 0
117 Diet Coke (Child) 0
122 Diet Dr Pepper (Small) 0
123 Diet Dr Pepper (Medium) 0
124 Diet Dr Pepper (Large) 0
98 Large French Fries 0
80 Chicken McNuggets (10 piece) 0
79 Chicken McNuggets (6 piece) 0
136 Dasani Water Bottle 0
137 Iced Tea (Small) 0
138 Iced Tea (Medium) 0
139 Iced Tea (Large) 0
140 Iced Tea (Child) 0
78 Chicken McNuggets (4 piece) 0
146 Coffee (Medium) 0
38 Hash Brown 0
147 Coffee (Large) 0
125 Diet Dr Pepper (Child) 0
97 Medium French Fries 0
Enter fullscreen mode Exit fullscreen mode
So only 25 elements of 260, which means that only the 9.61% of the items in McDonalds doesn’t have any amount of sugar. Now, let’s do the plot to see this graphically, for this I’m going to use the Plotly library:
print("Let's start with the bar chart")
data = [go.Bar(
y = df_sugars['Sugars'].values,
x = df_sugars['Item'].values,
)]
py.iplot(data, filename='basic-bar')
Enter fullscreen mode Exit fullscreen mode
Also, I’m going to plot a scatter plot:
# Now let's plot a scatter plot # This plot is based on the one made by Anisotropic: # https://www.kaggle.com/arthurtok/super-sized-we-mcdonald-s-nutritional-metrics
trace = go.Scatter(
y = df_sugars['Sugars'].values,
x = df_sugars['Item'].values,
mode='markers',
marker=dict(
size= df_sugars['Sugars'].values,
#color = np.random.randn(500), #set color equal to a variable color = df_sugars['Sugars'].values,
colorscale='Portland',
showscale=True
),
text = menu['Item'].values
)
data = [trace]
layout= go.Layout(
autosize= True,
title= 'Scatter plot of Sugars per Item on the Menu',
hovermode= 'closest',
xaxis=dict(
showgrid=False,
zeroline=False,
showline=False
),
yaxis=dict(
title= 'Sugars(g)',
ticklen= 5,
gridwidth= 2,
showgrid=False,
zeroline=False,
showline=False
),
showlegend= False
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig,filename='scatterChol')
Enter fullscreen mode Exit fullscreen mode
The OMS tell that the max amount of sugar per day should be 50g. Let’s see the items of the menu go over this threshold:
# First let's add a new column to the dataframe, all equal to 50 df_sugars['Amount of Sugar recommended (g)'] = 50
# Let's plot them
trace1 = go.Bar(
y = df_sugars['Sugars'].values,
x = df_sugars['Item'].values,
name='Sugars(g)'
)
trace2 = go.Bar(
y = df_sugars['Amount of Sugar recommended (g)'].values,
x = df_sugars['Item'].values,
name='Recommended value of sugar OMS (g)'
)
data = [trace1, trace2]
layout = go.Layout(
barmode='group'
)
layout= go.Layout(
autosize= True,
title= 'Relation between OMSs recommendation and Sugars per Item on the Menu',
hovermode= 'closest',
xaxis=dict(
showgrid=False,
zeroline=False,
showline=False
),
yaxis=dict(
title= 'Sugars(g)',
ticklen= 5,
gridwidth= 2,
showgrid=False,
zeroline=False,
showline=False
),
showlegend= False
)
fig = go.Figure(data=data, layout=layout)
graph = py.iplot(fig, filename='grouped-bar')
Enter fullscreen mode Exit fullscreen mode
So as you could see, there’s a lot of items in the menu that are bad for our health
for not saying all of them. For seeing the items in a more detailed way, you could check the notebook because the plots are interactive.
I hope you like this short analysis I’ve made. Check the repository in Github! 🙂
暂无评论内容