import seaborn from sklearn.linear_model import LinearRegression from sklearn.feature_selection import RFE from sklearn.preprocessing import LabelEncoder df = seaborn.load_dataset("penguins") print(df.head()) print(df.isnull().sum()) df.drop(labels=["island", "sex"], axis=1, inplace=True) df = df[["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g", "species"]] df.bill_length_mm.fillna(value=df["bill_length_mm"].median(), inplace=True) df.bill_depth_mm.fillna(value=df["bill_depth_mm"].median(), inplace=True) df.flipper_length_mm.fillna(value=df["flipper_length_mm"].median(), inplace=True) df.body_mass_g.fillna(value=df["body_mass_g"].median(), inplace=True) print(df.isnull().sum()) label_encoder = LabelEncoder() df["species"] = label_encoder.fit_transform(df["species"]) print(df.head()) linear_regressor = LinearRegression() rfe = RFE(estimator=linear_regressor, n_features_to_select=3, step=1) rfe.fit(df[["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]], df["species"]) selected_features = rfe.get_support(indices=True) print(“Selected Features: “, selected_features) df2 = df[df.columns[selected_features]] df2["species"] = df["species"] print(df2.head())
Please note that after the label encoding, we are using the RFE class from the sklearn.feature_selection module. We are passing a linear regressor in the RFE() constructor. This linear regressor will be used to determine the predictive power of each feature. The n_features_to_select parameter indicates the number of features to select. And the step=1 parameter indicates that we will eliminate one feature at each step.
The following Python statement gives us the indices of the selected features.
selected_features = rfe.get_support(indices=True)
We are then creating another DataFrame with the selected features and the column with the output labels.
df2 = df[df.columns[selected_features]] df2["species"] = df["species"]
The output of the above program will be: …






0 Comments