df = data.frame
Now, we are using the following Python statements to create two DataFrames, one for features and one for the output labels.
df_labels = df[["MedHouseVal"]] df_features = df.drop(["MedHouseVal"], axis=1) print(df_features.head()) print(df_labels.head())
Now, we will use the RandomForestRegressor and ModelSelection class from the sklearn library to perform feature selection based on model performance.
from sklearn.feature_selection import SelectFromModel from sklearn.datasets import fetch_california_housing from sklearn.ensemble import RandomForestRegressor data = fetch_california_housing(as_frame=True) df = data.frame print(df.info()) print(df.head()) df_labels = df[["MedHouseVal"]] df_features = df.drop(["MedHouseVal"], axis=1) print(df_features.head()) print(df_labels.head()) random_forest_regressor = RandomForestRegressor() random_forest_regressor.fit(df_features.values, df_labels["MedHouseVal"]) model = SelectFromModel(random_forest_regressor, prefit=True, threshold="mean") X_transformed = model.transform(df_features.values) selected_features = model.get_support(indices=True) print(“Selected Features: “, selected_features) df2 = df[df.columns[selected_features]] df2["MedHouseVal"] = df["MedHouseVal"] print(df2.head())
Please note that prefit=True parameter indicates that a prefit model is directly passed into the constructor. And the threshold=”mean” parameter indicates that the threshold used to select features is the mean value. So, features whose absolute importance value is greater or equal are kept and the rest are discarded…






0 Comments