CorrelationReducer¶
-
class
paralytics.
CorrelationReducer
(thresh=0.8, method='pearson')[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Removes correlated columns exceeding the thresh value.
- Parameters
- method: string, optional (default=’pearson’)
Compute pairwise correlation of columns, excluding NA/null values (based on pandas.DataFrame.corr).
pearson: Standard correlation coefficient.
kendall: Kendall Tau correlation coefficient.
spearman: Spearman rank correlation.
- thresh: float, optional (default=.8)
Threshold value after which further rejection of variables is discontinued.
- Attributes
- correlated_cols_: list
List of correlated features from a given dataset that exceeded thresh.
Methods Summary
fit
(self, X[, y])Fits columns with a correlation coefficients exceeding the threshold.
transform
(self, X)Apply feature selection based on correlation coefficients.
Methods Documentation
-
fit
(self, X, y=None)[source]¶ Fits columns with a correlation coefficients exceeding the threshold.
- Parameters
- X: DataFrame, shape = (n_samples, n_features)
Input data, where n_samples is the number of samples and n_features is the number of features.
- y: Ignore
- Returns
- self: object
Returns the instance itself.
-
transform
(self, X)[source]¶ Apply feature selection based on correlation coefficients.
Removes correlated features with coefficient higher than the threshold value.
- Parameters
- X: DataFrame, shape = (n_samples, n_features)
Input data on which variables elimination will be applied.
- Returns
- X_new: DataFrame, shape = (n_samples, n_features_new)
X data with variables remaining after applying feature elimination.