Identifying Outliers in a data set

What are Outliers

An Outlier is an extremely high or extremely low value in our data .It can be identify if it is greater than Q3 + 1.5(IQR) or lower tha Q1 – 1.5(IQR).

IQR = Q3 – Q1

Note:

  • IQR means Interquartile Range

  • Q1 means first quartile

  • Q3 means third quartile

`import numpy as np

data = [32, 36, 46, 47, 56, 69, 75, 79, 79, 88, 89, 91, 92, 93, 96, 97,
101, 105, 112, 116]

Q1 = np.median(data[:10])

Q3 = np.median(data[10:])

IQR = Q3 – Q1

print(IQR)

`

Other example

import numpy as np
import pandas as pd
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})
q75, q25 = np.percentile(df['points'], [75 ,25])
iqr = q75 - q25
iqr
5.75
import numpy as np
import pandas as pd
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
                   'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})


q75, q25 = np.percentile(df['points'], [75 ,25])
iqr = q75 - q25


iqr

5.75
import numpy as np import pandas as pd df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86], 'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19], 'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5], 'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]}) q75, q25 = np.percentile(df['points'], [75 ,25]) iqr = q75 - q25 iqr 5.75

Enter fullscreen mode Exit fullscreen mode

原文链接:Identifying Outliers in a data set

© 版权声明
THE END
喜欢就支持一下吧
点赞5 分享
Those who fly solo have the strongest wings.
那些单独飞翔的人拥有最强大的翅膀
评论 抢沙发

请登录后发表评论

    暂无评论内容