CategoricalGrouper¶
-
class
paralytics.preprocessing.
CategoricalGrouper
(method='freq', percentile_thresh=0.05, new_cat='Other', include_cols=None, exclude_cols=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Groups sparse observations in a categorical columns into one category.
- Parameters
- method: string {‘freq’}, optional (default=’freq’)
The sparse categories grouping method:
freq:
Counts the frequency against each category. Retains categories whose cumulative share (with respect to descending sort) in the total dataset is equal or higher than the percentile threshold.
- percentile_thresh: float, optional (default=.05)
Defines the percentile threshold for ‘freq’ method.
- new_cat: string or int, optional (default=’Other’)
Specifies the category name that will be imputed to the chosen sparse observations.
- include_cols: list, optional (default=None)
Specifies column names that should be treated like categorical features. If None then estimator is executed only on the automatically selected categorical columns.
- exclude_cols: list, optional (default=None)
Specifies categorical column names that should not be treated like categorical features. If None then no column is excluded from transformation.
- Attributes
- cat_cols_: list
List of categorical columns in a given dataset.
- imp_cats_: dict
Dictionary that keeps track of replaced category names with the new category for every feature in the dataset.
Methods Summary
fit
(self, X[, y])Fits grouping with X by using given method.
transform
(self, X)Apply grouping of sparse categories on X.
Methods Documentation