RangeIndex: 344 entries, 0 to 343 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 species 344 non-null object 1 island 344 non-null object 2 bill_length_mm 342 non-null float64 3 bill_depth_mm 342 non-null float64 4 flipper_length_mm 342 non-null float64 5 body_mass_g 342 non-null float64 6 sex 333 non-null object dtypes: float64(4), object(3) memory usage: 18.9+ KB None bill_length_mm 29.807054 bill_depth_mm 3.899808 flipper_length_mm 197.731792 body_mass_g 643131.077327 dtype: float64
In sklearn, we can use the class VarianceThreshold to select features that are more than the threshold value. We can use the following Python code for that purpose:
from sklearn.feature_selection import VarianceThreshold import seaborn df = seaborn.load_dataset("penguins") print(df.info()) features = df.drop(["species", "island", "sex"], axis=1) print(features.var()) feature_selection = VarianceThreshold(threshold=4) feature_selection.fit(features) selected_features = features.columns[feature_selection.get_support()] print(type(selected_features)) df2 = df.filter(selected_features, axis=1) print(df2.head())
The following Python statements select the features that have a threshold of more than 4…






0 Comments