find_sparsity¶
- 
paralytics.utils.find_sparsity(X, thresh=0.01)[source]¶
- Finds columns with highly sparse categories. - For categorical and binary features finds columns where categories with relative frequencies under the threshold are present. - For numerical features (excluding binary variables) returns columns where NaNs or 0 are dominating in the given dataset. - Parameters
- X: pandas.DataFrame
- Data to be checked for sparsity. 
- thresh: float, optional (default=.01)
- Fraction of one of the categories under which the sparseness will be reported. 
 
- Returns
- sparse_{num, bin, cat}: list
- List of {numerical, binary, categorical} X column names where high sparsity was detected.