replace ages lower than the lower limit with the lower limit. Otherwise, if the age is more than the upper limit, the age will be replaced by the upper limit. If these two conditions are not met, then the age will be as it is.
df["age"] = numpy.where(df["age"] > upper_limit, upper_limit, numpy.where(df["age"] < lower_limit, lower_limit, df["age"]))
Please note that we can also use the following Python statement to remove the outliers after detecting them.
df = df[(df["age"] >= lower_limit) & (df["age"] <= upper_limit)]
The output of the above program for outlier capping will be:
Lower limit of age: 4.0 Upper limit of age: 56.0 Outliers: survived pclass sex age ... deck embark_town alive alone 7 0 3 male 2.00 ... NaN Southampton no False 11 1 1 female 58.00 ... C Southampton yes True 16 0 3 male 2.00 ... NaN Queenstown no False 33 0 2 male 66.00 ... NaN Southampton no True 43 1 2 female 3.00 ... NaN Cherbourg yes False .. ... ... ... ... ... ... ... ... ... 824 0 3 male 2.00 ... NaN Southampton no False 827 1 2 male 1.00 ... NaN Cherbourg yes False 829 1 1 female 62.00 ... B NaN yes True 831 1 2 male 0.83 ... NaN Southampton yes False 851 0 3 male 74.00 ... NaN Southampton no True [65 rows x 15 columns] survived pclass sex age ... deck embark_town alive alone 0 0 3 male 22.0 ... NaN Southampton no False 1 1 1 female 38.0 ... C Cherbourg yes False 2 1 3 female 26.0 ... NaN Southampton yes True 3 1 1 female 35.0 ... C Southampton yes False 4 0 3 male 35.0 ... NaN Southampton no True [5 rows x 15 columns]






0 Comments