find_sparsity¶
-
paralytics.utils.
find_sparsity
(X, thresh=0.01)[source]¶ Finds columns with highly sparse categories.
For categorical and binary features finds columns where categories with relative frequencies under the threshold are present.
For numerical features (excluding binary variables) returns columns where NaNs or 0 are dominating in the given dataset.
- Parameters
- X: pandas.DataFrame
Data to be checked for sparsity.
- thresh: float, optional (default=.01)
Fraction of one of the categories under which the sparseness will be reported.
- Returns
- sparse_{num, bin, cat}: list
List of {numerical, binary, categorical} X column names where high sparsity was detected.