In our previous articles, we discussed equal-width discretization and equal-frequency discretization. We can also perform discretization or binning using custom bin values. This type of discretization is called custom discretization.
For example, let’s read the titanic dataset. The dataset has a column named age. Now, let’s say those aged 0 to 5 years should be labeled as toddlers. Those aged 5 to 18 years should be labeled as young. Those who are more than 18, but less than 60 years should be labeled as adults. And the rest should be labeled as seniors. In other words, we want to discretize the age column based on custom bin values. We can use the following Python code for that purpose:
import pandas df = pandas.read_csv("titanic.csv") print(df.head()) df["age_group"] = pandas.cut(x=df["age"], bins=[0, 5, 18, 60, 100], labels=["toddler", "young", "adult", "senior"]) print(df.head())
Here, we are using the pandas.cut() function for discretization and the bins parameter of the function indicates the custom bin values. We are also labeling the bins after discretization.
The output of the above program will be:
survived pclass sex age ... deck embark_town alive alone 0 0 3 male 22.0 ... NaN Southampton no False 1 1 1 female 38.0 ... C Cherbourg yes False 2 1 3 female 26.0 ... NaN Southampton yes True 3 1 1 female 35.0 ... C Southampton yes False 4 0 3 male 35.0 ... NaN Southampton no True [5 rows x 15 columns] survived pclass sex age ... embark_town alive alone age_group 0 0 3 male 22.0 ... Southampton no False adult 1 1 1 female 38.0 ... Cherbourg yes False adult 2 1 3 female 26.0 ... Southampton yes True adult 3 1 1 female 35.0 ... Southampton yes False adult 4 0 3 male 35.0 ... Southampton no True adult [5 rows x 16 columns]






0 Comments