Let’s say we have a column in a DataFrame that contains numerical values, e.g., age. Now, we want to discretize or bin the age to various age groups. For example, those aged 0 to 5 years should be labeled as toddlers. Those aged 5 to 18 years should be labeled as young. Those who are more than 18, but less than 60 years should be labeled as adults. And the rest should be labeled as a senior.
We can use the cut() function to discretize or bin numerical values in a column using the pandas Python library. Let’s have a look at this with an example.
Let’s read the titanic dataset from a CSV file and have a look at the first few lines of the dataset.
import pandas df = pandas.read_csv("titanic.csv") print(df.head())
The output will be:
survived pclass sex age ... deck embark_town alive alone 886 0 2 male 27.0 ... NaN Southampton no True 887 1 1 female 19.0 ... B Southampton yes True 888 0 3 female NaN ... NaN Southampton no False 889 1 1 male 26.0 ... C Cherbourg yes True 890 0 3 male 32.0 ... NaN Queenstown no True
So, there is a column called age that contains numerical values. Let’s discretize the ages as per the age groups as mentioned previously. We can use the following Python code for that purpose:






0 Comments