VIFSelector¶

class paralytics.VIFSelector(thresh=5.0, impute=False, impute_method='mean', fit_intercept=True, verbose=0)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Makes feature selection based on Variance Inflation Factor.

Calculates Variance Inflation Factor for a given dataset, in each iteration discarding the variable with the highest VIF value and repeats this process until it is not below the declared threshold.

Parameters

thresh: float, optional (default=5.0): Threshold value after which further rejection of variables is discontinued.
impute: boolean, optional (default=False): Declares whether missing values imputation should be performed.
impute_method: string, optional (default=”mean”): Declares numerical imputation method for the paralytics.preprocessing.Imputer.
fit_intercept: bool, optional (default=True): Specifies if the constant (a.k.a. bias or intercept) should be added to the decision functions.
verbose: int, optional (default=0): Controls verbosity of output. If 0 there is no output, if 1 displays

See also

paralytics.preprocessing.Imputer

References

[1] Ffisegydd, sklearn multicollinearity class, 2017

Attributes

imputer_: estimator: The estimator by means of which missing values imputation is performed.
viffed_cols_: list: List of features from a given dataset that exceeded thresh.
kept_cols_: list: List of features that left after the vif procedure.

Methods Summary

`fit`(self, X[, y])	Fits columns with a VIF value exceeding the threshold.
`transform`(self, X)	Apply feature selection based on Variance Inflation Factor.

Methods Documentation

fit(self, X, y=None)[source]¶

Fits columns with a VIF value exceeding the threshold.

If specified, fits the imputer on X.

Parameters

X: DataFrame, shape = (n_samples, n_features): Input data, where n_samples is the number of samples and n_features is the number of features.l

Returns

self: object: Returns the instance itself.

transform(self, X)[source]¶

Apply feature selection based on Variance Inflation Factor.

Until the maximum VIF in the given dataset does not exceed the declared threshold, in every iteration independent variables’ VIF values are calculated and the variable with the highest VIF value is removed.

Parameters

X: DataFrame, shape = (n_samples, n_features): Input data on which variables elimination will be applied.

Returns

X_new: DataFrame, shape = (n_samples, n_features_new): X data with variables remaining after applying feature elimination.