Data Science With Python: Where And How To Start (4 Part Series)

1 Reading and Manipulating Your Dataset With Pandas
2 Reading and Manipulating Your Dataset With Pandas (2)
3 How to prove that your cat is fat (with statistics and python)
4 Beginners’ journey to machine learning

Manipulation

Let’s say you need to see only one column of your dataframe. To see the ‘fixed acidity’ column of our dataset, you need to write:

df['fixed acidity']

Enter fullscreen mode Exit fullscreen mode

If you add a condition to this column, for example, if you want to see the rows that has a fixed acidity higher than 9:

df[df['fixed acidity']>9]

Enter fullscreen mode Exit fullscreen mode

Sometimes you might need rows with multiple conditions added to columns:

df[(df['fixed acidity']>9) & (df['citric acid']>0.5)]

Enter fullscreen mode Exit fullscreen mode

If you need to find specific columns:

df.loc[:,['volatile acidity', 'chlorides']]

Enter fullscreen mode Exit fullscreen mode

You may want to add conditions with them too, for example, you may want to see the ‘volatile acidity’ and ‘chlorides’ content of those rows that have a ‘fixed acidity’ of 9.2:

df.loc[df['fixed acidity'] == 9.2, ['fixed acidity','volatile acidity', 'chlorides']]

Enter fullscreen mode Exit fullscreen mode

You can view the rows for specific indices (as discussed in the previous chapter) too, like this:

df.loc[0:3, ['volatile acidity', 'chlorides']]

Enter fullscreen mode Exit fullscreen mode

Now, if you want to locate a specific value, for example, the alchohol content of the wine of 0th row:

df['alcohol'].loc[0]

Enter fullscreen mode Exit fullscreen mode

and you will get a value of 9.4

You can find locate a row using its index too:

df.iloc[100]

Enter fullscreen mode Exit fullscreen mode

Now if you want to pinpoint a value within this, for example, the 1st attribute (volatile acidity in this case) of the 100th row, try:

df.iloc[100][1]

Enter fullscreen mode Exit fullscreen mode

and you will get 0.61 as expected.

You can locate specific consecutive rows and columns using this iloc command, for example, first three columns of 3rd to 7th row:

df.iloc[3:8, 0:3]

Enter fullscreen mode Exit fullscreen mode

and non consecutive rows and columns too:

df.iloc[[71, 122, 400], [0, 2]]

Enter fullscreen mode Exit fullscreen mode

What if you want to add a new column to your dataframe? Let’s add a ‘new column’ containing the word ‘hi’ for all rows:

df['new column'] = 'hi'
df.head()

Enter fullscreen mode Exit fullscreen mode

Let’s try changing the value of ‘new column’ of 0th index of the dataframe using iloc from ‘hi’ to ‘bye’:

df.iloc[0, df.columns.get_loc('new column')]= 'bye'
df.head()

Enter fullscreen mode Exit fullscreen mode

Now let’s try to find the word starts with ‘by’ (that we just have added) and replace it with ‘hello’:

df['new column'].loc[df['new column'].str.startswith('by')] = 'hello'
df.head()

Enter fullscreen mode Exit fullscreen mode

You can also replace null values of your data using pandas. We do not have any null values here, so let’s introduce a null value first. Let’s replace the string ‘hello’ with null. To do so, we would need the numpy library.

import numpy as np
df['new column'].loc[df['new column'].str.startswith('hel')] = np.nan
df.head()

Enter fullscreen mode Exit fullscreen mode

To check the number of null values, you can use the isna() method like this:

df.isna().sum()

Enter fullscreen mode Exit fullscreen mode

This isna() method can also be used to locate the null value like this:

pd.isna(df.head())

Enter fullscreen mode Exit fullscreen mode

Let’s replace the null value with ‘hey’.

df.fillna(value='hey', inplace=True)
df.head()

Enter fullscreen mode Exit fullscreen mode

If you want to drop null values, use the dropna() method.

Now we will try to create a new dataframe using a loop, where one column of the new dataframe would look the same as the ‘new column’ of our dataframe df.

rows = []
for i in range(df.shape[0]):
     rows.append(['hi', 'bye'])
df_new = pd.DataFrame(rows, columns=["new column 2", "new column 3"])
df_new.iloc[0, df_new.columns.get_loc('new column 2')]= 'hey'
df_new.head()

Enter fullscreen mode Exit fullscreen mode

You can merge these two dataframes using their common attributes:

df_merged = df.merge(df_new, left_on='new column', right_on='new column 2')
df_merged.head()

Enter fullscreen mode Exit fullscreen mode

You can make necessary variations in your merging operations by dropping mismatched attributes, or by using a column with common name and so on.

You can also group your dataframes:

df.groupby(['volatile acidity', 'chlorides']).count().head()

Enter fullscreen mode Exit fullscreen mode

You can also group the dataframes using other attributes like sum.

When you are done with manipulation of your dataframes, you are ready to visualize your data.

Data Science With Python: Where And How To Start (4 Part Series)

原文链接：Reading and Manipulating Your Dataset With Pandas (2)

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END