CorrelationReducer¶

class paralytics.CorrelationReducer(thresh=0.8, method='pearson')[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Removes correlated columns exceeding the thresh value.

Parameters

method: string, optional (default=’pearson’)

Compute pairwise correlation of columns, excluding NA/null values (based on pandas.DataFrame.corr).

thresh: float, optional (default=.8)

Threshold value after which further rejection of variables is discontinued.

Attributes

correlated_cols_: list: List of correlated features from a given dataset that exceeded thresh.

Methods Summary

`fit`(self, X[, y])	Fits columns with a correlation coefficients exceeding the threshold.
`transform`(self, X)	Apply feature selection based on correlation coefficients.

Methods Documentation

fit(self, X, y=None)[source]¶

Fits columns with a correlation coefficients exceeding the threshold.

Parameters

X: DataFrame, shape = (n_samples, n_features): Input data, where n_samples is the number of samples and n_features is the number of features.
y: Ignore

Returns

transform(self, X)[source]¶

Apply feature selection based on correlation coefficients.

Removes correlated features with coefficient higher than the threshold value.

Parameters

X: DataFrame, shape = (n_samples, n_features): Input data on which variables elimination will be applied.

Returns

X_new: DataFrame, shape = (n_samples, n_features_new): X data with variables remaining after applying feature elimination.