We can use quantile information of data to set an upper and lower limit. If a value is more than the upper limit or less than the lower limit, then we can either remove the data or replace the value with the upper or lower limit of the data.
Let’s look at an example. Let’s read the titanic dataset. The age column of the dataset contains the age of passengers. Now, we can set 0.05 quantile as the lower limit and 0.95 quantile as the upper limit. After that, we can replace values that are lower than the lower limit with the lower limit and values that are higher than the upper limit with the upper limit. We can also remove the data that are more than 0.95 quantile or less than 0.05 quantile. In this example, we will cap the outliers using the quantile information of the data.
We can use the following Python code for that purpose:
import seaborn import numpy df = seaborn.load_dataset("titanic") lower_limit = df["age"].quantile(q=0.05) upper_limit = df["age"].quantile(q=0.95) print("Lower limit of age: ", lower_limit) print("Upper cutoff of age: ", upper_limit) print("Outliers: \n", df[(df["age"] > upper_limit) | (df["age"] < lower_limit)]) df["age"] = numpy.where(df["age"] > upper_limit, upper_limit, numpy.where(df["age"] < lower_limit, lower_limit, df["age"])) print(df.head())
Here, we are first calculating the lower_limit and upper_limit of age. After that, we are using the numpy.where() function to ...






0 Comments