FeatureEffectExplainer¶

class paralytics.xai.FeatureEffectExplainer(estimator, features, dtypes=None, sample_size=None, estimation_values=100, n_jobs=None, random_state=None)[source]¶

Bases: sklearn.base.BaseEstimator, paralytics.xai.ExplainerMixin

Visualizes the effect of one or two features on the prediction.

Parameters

estimator: TODO

features: str or list of length at most 2

TODO: Grid features.

dtypes: dict, optional (default=None)

Types of the passed features. Possible values: ‘numeric’ or ‘category’. Has to be passed as a dictionary where the key is the name of the feature for which data type is specified. Left by default it is determined automatically during fit method execution.

Based on this parameter the appropriate explainers are selected.

sample_size: int or float, optional (default=None)

TODO

estimation_values: int or dict, optional (default=100)

Declares number of values to generate for the grid feature or explicitly specifies those values. When passed as:

int:

Automatically detects whether the grid feature is numeric and if:
- True:
  
  Generates the set of values from the lowest to the highest value recorded in the data set passed to the fit method with the interspace depending on the number of values to generate specified in the n_estimated_values parameter.
- False:
  
  Takes all of the explained feature’s unique values and imputes into grid feature. When you need to consider only a subset of the unique categories, pass them to the dictionary with a key being name of the feature.
When two features are specified then takes the given value for both of them.
dict:

Manually specify the values or pass separately for every feature how many values to generate. Dictionary specification:
- key:
  
  Feature name passed to the features parameter.
- value:
  
  Integer indicating how many values generate between the lowest and the highest value recorded in the data set or array of values with which grid feature will be imputed to make predictions for the synthetic data set.

n_jobs: TODO

random_state: int, optional (default=None)

Seed for the sample generator. Used when sample_size is not None.

References

[1] C. Molnar, Interpretable Machine Learning, 2019

Attributes

dtypes_: list: Actual data types of grid features after evaluation if the automatic determination was specified. Otherwise is equal to dtypes but converted to a list where order ise the same as the order of passed features.
estimation_values_: list: Actual estimation values used to calculate dependency plots. The order of values is the same as the order of passed features.
base_values_: np.array, shape = (n_samples, n_grid_features): TODO
grid_values_: np.array, shape = (n_grid_values, n_grid_features): TODO
y_grid_predictions_: np.array, shape = (n_samples, n_grid_values): Array of predictions for every grid values set where rows are predictions for consecutive observations.

Attributes Summary

`CORRECT_DTYPES`
`dtypes`
`estimation_values`
`features`

Methods Summary

`explain`(self[, pdplot, iceplot, mplot, …])	Explains the features effect with use of the selected methods.
`fit`(self, X[, y])	Fits creation of synthetic data to X.
`predict_grid_features`(self, X)	Predicts previously substituting grid features with generated values.
`select_sample`(self, X)	Selects sample data with sample_size number of samples.

Attributes Documentation

CORRECT_DTYPES = {'category', 'numeric'}¶

dtypes¶

estimation_values¶

features¶

Methods Documentation

explain(self, pdplot=True, iceplot=False, mplot=False, aleplot=False, automatic_layout=True, centers=None, iceplot_thresh=None, neighborhoods=0.1, pdline_params=None, iceline_params=None, mline_params=None, aleline_params=None, contour_params=None, contourf_params=None, bar_params=None, imshow_params=None, text_params=None, verbose=True, ax=None)[source]¶

Explains the features effect with use of the selected methods.

Parameters

pdplot: bool, optional (default=True)

Defines if Partial Dependence Plot should be displayed. It visualizes marginal effect that grid features have on predictions with use of the Monte Carlo method.

iceplot: bool, optional (default=False)

Defines whether Individual Conditional Expectation plots should be displayed. Only possible if a single numeric feature is explained.

mplot: bool, optional (default=False)

Defines if Marginal Plot should be displayed. It visualizes conditional effect that grid features have on predictions. Only possible for numeric features.

aleplot: bool, optional (default=False)

Defines if Accumulated Local Effects Plot should be displayed. It visualizes accumulated differences between predictions based on the conditional distribution of the feature.

automatic_layout: bool, optional (default=True)

Specified whether format the plots in the automatic manner including ticks adjustment, axis signing, text formatting etc. or leave the plot in the raw state.

centers: int or float or string or list, optional (default=None)

Defines the center value that all of the predictions will be compared to and displayed as a difference in the prediction to this point. By default no centering is done. If:

min:

Specifies that minimum of the grid features will be used for centering.

Should be passed as the list of values mentioned above if two features are passed to explanation, in the same order in which the features are given.

iceplot_thresh: int or float, optional (default=None)

Declares how many observations to take to visualize the ICE plots. If int, gives the exact number of observations, if float, gives a fraction of all observations to be taken.

neighborhoods: int or float or list, optional (default=.1)

Neighborhood of the value to determine the interval [current_value - neighborhood, current_value + neighborhood] for which predictions will be averaged. Taken under consideration only when mplot == True or aleplot == True. If:

int:

Absolute value that will be deducted and added from the current value to determine the interval for synthetic data generation.
float:

Fraction of the difference between biggest and smallest value in the variable to calculate the interval boundaries.

Should be passed as the list of values mentioned above if two features are passed to explanation, in the same order in which the features are given.

{pd, ice, m, ale}plot_params: dicts, optional (default=None)

Keyword arguments for underlying plotting functions.

verbose: TODO

ax: TODO

Returns

TODO

fit(self, X, y=None)[source]¶

Fits creation of synthetic data to X.

Parameters

X: pandas.DataFrame: TODO
y: ignore

Returns

self: object: Returns the instance itself.

predict_grid_features(self, X)[source]¶

Predicts previously substituting grid features with generated values.

For every combination in the cartesian product of unique grid features generates a temporary DataFrame containing this set of values across the whole grid features leaving the rest of the features unchanged. Then makes prediction for this synthetic DataFrame.

Returns list of predictions for every synthetic DataFrame and list of grid values which replaced the original values across the grid features to create these DataFrames for prediction.

select_sample(self, X)[source]¶: Selects sample data with sample_size number of samples.