feature_selection = VarianceThreshold(threshold=4) feature_selection.fit(features) selected_features = features.columns[feature_selection.get_support()]
Please note that the get_support() method gives us a mask or integer index of the selected features. After that, we filter the columns corresponding to those selected features.
df2 = df.filter(selected_features, axis=1)
The output of the above program will be:
RangeIndex: 344 entries, 0 to 343 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 species 344 non-null object 1 island 344 non-null object 2 bill_length_mm 342 non-null float64 3 bill_depth_mm 342 non-null float64 4 flipper_length_mm 342 non-null float64 5 body_mass_g 342 non-null float64 6 sex 333 non-null object dtypes: float64(4), object(3) memory usage: 18.9+ KB None bill_length_mm 29.807054 bill_depth_mm 3.899808 flipper_length_mm 197.731792 body_mass_g 643131.077327 dtype: float64 bill_length_mm flipper_length_mm body_mass_g 0 39.1 181.0 3750.0 1 39.5 186.0 3800.0 2 40.3 195.0 3250.0 3 NaN NaN NaN 4 36.7 193.0 3450.0






0 Comments