What is the frequent category imputation in machine learning?
If a column contains only numerical data, then we can use mean or median imputation to fill in the missing values of the column. But if a column contains categorical values, then we mostly use most frequent category imputation for filling in the missing values of the column.
Please note that the most frequent value in a column is also the mode of the values of the column. Hence, to fill in the missing categorical values, we can calculate the mode of the data and then, use the mode to fill in the missing values.
How to perform the frequent category imputation in machine learning?
Let’s read the titanic dataset. If we print the percentage of missing values in each column of the dataset, we will see some values are missing from the embark town column.
import seaborn df = seaborn.load_dataset("titanic") print(df.isnull().mean()*100)
The output shows the following:
survived 0.000000 pclass 0.000000 sex 0.000000 age 19.865320 sibsp 0.000000 parch 0.000000 fare 0.000000 embarked 0.224467 class 0.000000 who 0.000000 adult_male 0.000000 deck 77.216611 embark_town 0.224467 alive 0.000000 alone 0.000000 dtype: float64
So, there are almost 0.224467% missing values in the embark town column. So, let’s find out the most frequent value of the …






0 Comments