Imputer¶
-
class
paralytics.preprocessing.
Imputer
(columns=None, numerical_method='mean', categorical_method='mode')[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Imputes missing values of the dataframe.
Imputes missing values with the method adjusted based on the column type. For numerical columns imputes missings with the value calculated based on the numerical_method. For categorical methods imputes missings with the most frequent value in the column.
- Parameters
- columns: list, optional (default=None)
Defines columns which missings will be imputed. If not specified imputes all of the dataframe columns.
- numerical_method: string {mean, median}, optional (default=’mean’)
Method that will be applied to impute numerical columns. Accepts all of the pd.DataFrame methods returning some statistic.
- categorical_method: string {mode}, optional (default=’mode’)
Method that will be applied to impute categorical columns. Accepts all of the pd.DataFrame methods returning some statistic.
- Attributes
- imputing_dict_: dict, length = n_features
Dictionary of values to be imputed in place of NaN’s. The key is the column name and the value is the value to impute for NaN in the corresponding column.
Methods Summary
fit
(self, X[, y])Fits corresponding imputation values to the X columns.
transform
(self, X)Applies missing values imputation to X.
Methods Documentation
-
fit
(self, X, y=None)[source]¶ Fits corresponding imputation values to the X columns.
- Parameters
- X: DataFrame, shape = (n_samples, n_features)
Training data with missing values.
- y: ignore
- Returns
- self: object
Returns the instance itself.
-
transform
(self, X)[source]¶ Applies missing values imputation to X.
- Parameters
- X: DataFrame, shape = (n_samples, n_features)
New data with n_samples as its number of samples.
- Returns
- X_new: DataFrame, shape = (n_samples, n_features)
X data with substituted missing values to their respective imputation values from imputing_dict_.