VIFSelector¶
-
class
paralytics.
VIFSelector
(thresh=5.0, impute=False, impute_method='mean', fit_intercept=True, verbose=0)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Makes feature selection based on Variance Inflation Factor.
Calculates Variance Inflation Factor for a given dataset, in each iteration discarding the variable with the highest VIF value and repeats this process until it is not below the declared threshold.
- Parameters
- thresh: float, optional (default=5.0)
Threshold value after which further rejection of variables is discontinued.
- impute: boolean, optional (default=False)
Declares whether missing values imputation should be performed.
- impute_method: string, optional (default=”mean”)
Declares numerical imputation method for the paralytics.preprocessing.Imputer.
- fit_intercept: bool, optional (default=True)
Specifies if the constant (a.k.a. bias or intercept) should be added to the decision functions.
- verbose: int, optional (default=0)
Controls verbosity of output. If 0 there is no output, if 1 displays
See also
References
[1] Ffisegydd, sklearn multicollinearity class, 2017
- Attributes
- imputer_: estimator
The estimator by means of which missing values imputation is performed.
- viffed_cols_: list
List of features from a given dataset that exceeded thresh.
- kept_cols_: list
List of features that left after the vif procedure.
Methods Summary
fit
(self, X[, y])Fits columns with a VIF value exceeding the threshold.
transform
(self, X)Apply feature selection based on Variance Inflation Factor.
Methods Documentation
-
fit
(self, X, y=None)[source]¶ Fits columns with a VIF value exceeding the threshold.
If specified, fits the imputer on X.
- Parameters
- X: DataFrame, shape = (n_samples, n_features)
Input data, where n_samples is the number of samples and n_features is the number of features.l
- Returns
- self: object
Returns the instance itself.
-
transform
(self, X)[source]¶ Apply feature selection based on Variance Inflation Factor.
Until the maximum VIF in the given dataset does not exceed the declared threshold, in every iteration independent variables’ VIF values are calculated and the variable with the highest VIF value is removed.
- Parameters
- X: DataFrame, shape = (n_samples, n_features)
Input data on which variables elimination will be applied.
- Returns
- X_new: DataFrame, shape = (n_samples, n_features_new)
X data with variables remaining after applying feature elimination.