CorrelationReducer

class paralytics.CorrelationReducer(thresh=0.8, method='pearson')[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Removes correlated columns exceeding the thresh value.

Parameters
method: string, optional (default=’pearson’)

Compute pairwise correlation of columns, excluding NA/null values (based on pandas.DataFrame.corr).

  • pearson: Standard correlation coefficient.

  • kendall: Kendall Tau correlation coefficient.

  • spearman: Spearman rank correlation.

thresh: float, optional (default=.8)

Threshold value after which further rejection of variables is discontinued.

Attributes
correlated_cols_: list

List of correlated features from a given dataset that exceeded thresh.

Methods Summary

fit(self, X[, y])

Fits columns with a correlation coefficients exceeding the threshold.

transform(self, X)

Apply feature selection based on correlation coefficients.

Methods Documentation

fit(self, X, y=None)[source]

Fits columns with a correlation coefficients exceeding the threshold.

Parameters
X: DataFrame, shape = (n_samples, n_features)

Input data, where n_samples is the number of samples and n_features is the number of features.

y: Ignore
Returns
self: object

Returns the instance itself.

transform(self, X)[source]

Apply feature selection based on correlation coefficients.

Removes correlated features with coefficient higher than the threshold value.

Parameters
X: DataFrame, shape = (n_samples, n_features)

Input data on which variables elimination will be applied.

Returns
X_new: DataFrame, shape = (n_samples, n_features_new)

X data with variables remaining after applying feature elimination.