umami.evaluation_tools package#

Submodules#

umami.evaluation_tools.eval_tools module#

Script with all the higher level evaluation functions.

umami.evaluation_tools.eval_tools.calculate_fraction_dict(class_labels_wo_main: list, frac_min: float, frac_max: float, step: float) list#

Return all combinations of fractions for the given background classes which adds up to one.

Parameters:
  • class_labels_wo_main (list) – List of the background classes.

  • frac_min (float) – Minimum value of the fractions.

  • frac_max (float) – Minimum value of the fractions.

  • step (float) – Step size of the loop.

Returns:

List with the different dicts inside.

Return type:

list

Raises:

ValueError – If no combination of fractions yields a sum of 1.

umami.evaluation_tools.eval_tools.get_rej_per_eff_dict(jets, y_true, tagger_classes: list, tagger_preds: list, tagger_names: list, tagger_list: list, class_labels: list, main_class: str, frac_values: dict, frac_values_comp: dict, x_axis_granularity: int = 100, eff_min: float = 0.49, eff_max: float = 1.0, progress_bar: bool = False) dict#

Calculates the rejections for the classes and taggers provided for different efficiencies of the main class.

Parameters:
  • jets (pandas.DataFrame) – Dataframe with jets and the probabilites of the comparison taggers as columns.

  • y_true (numpy.ndarray) – Truth labels of the jets.

  • tagger_classes (list) – List of the classes that were used to train the freshly trained tagger. For example, if you want to test the behavior of tau jets in the tagger although the tagger was not trained on taus.

  • tagger_preds (list) – Prediction output of the taggers listed. [pred_dips, pred_umami]

  • tagger_names (list) – Names of the freshly trained taggers. [“dips”, “umami”]

  • tagger_list (list) – List of the comparison tagger names.

  • class_labels (list) – List of class labels which are used.

  • main_class (str) – The main discriminant class. For b-tagging obviously “bjets”

  • frac_values (dict) – Dict with the fraction values for the fresh taggers.

  • frac_values_comp (dict) – Dict with the fraction values for the comparison taggers.

  • x_axis_granularity (int) – Granularity of the efficiencies.

  • eff_min (float) – Lowest value for the efficiencies linspace.

  • eff_max (float) – Highst value for the efficiencies linspace.

  • progress_bar (bool, optional) – Decide, if a progress bar for the different effs is printed to the terminal. By default False.

Returns:

tagger_rej_dicts – Dict with the rejections for each tagger/class (wo main), disc cuts per effs and the effs.

Return type:

dict

umami.evaluation_tools.eval_tools.get_rej_per_frac_dict(jets, y_true: ndarray, tagger_classes: list, tagger_preds: list, tagger_names: list, tagger_list: list, class_labels: list, main_class: str, target_eff: float, step: float = 0.01, frac_min: float = 0.0, frac_max: float = 1.0, progress_bar: bool = False) dict#

Calculate the rejections for the background classes for all possible combinations of fraction values of the background classes. The fractions need to add up to one to be valid.

Parameters:
  • jets (pandas.DataFrame) – Dataframe with jets and the probabilites of the comparison taggers as columns.

  • y_true (numpy.ndarray) – Truth labels of the jets.

  • tagger_classes (list) – List of the classes that were used to train the freshly trained tagger. For example, if you want to test the behavior of tau jets in the tagger although the tagger was not trained on taus.

  • tagger_preds (list) – Prediction output of the taggers listed. [pred_dips, pred_umami]

  • tagger_names (list) – Names of the freshly trained taggers. [“dips”, “umami”]

  • tagger_list (list) – List of the comparison tagger names.

  • class_labels (list) – List of class labels which are used.

  • main_class (str) – The main discriminant class. For b-tagging obviously “bjets”

  • target_eff (float) – Target efficiency for which the rejections are calculated.

  • step (float, optional) – Step size of the change of the fraction values, by default 0.01

  • frac_min (float) – Minimum value of the fractions, by default 0.0.

  • frac_max (float) – Minimum value of the fractions, by default 1.0.

  • progress_bar (bool, optional) – Decide, if a progress bar for the different combinations is printed to the terminal. By default False.

Returns:

Dict with the rejections for the taggers for the given fraction combinations.

Return type:

dict

umami.evaluation_tools.eval_tools.get_saliency_map_dict(model: object, model_pred: ndarray, x_test: ndarray, y_test: ndarray, class_labels: list, main_class: str, frac_dict: dict, var_dict_path: str, tracks_name: str, n_trks: int | None = None, effs: list | None = None, n_jets: int = 100000) dict#

Calculating the saliency maps dict.

Parameters:
  • model (object) – Loaded Keras model.

  • model_pred (numpy.ndarray) – Model predictions of the model.

  • x_test (numpy.ndarray) – Inputs to the model.

  • y_test (numpy.ndarray) – Truth labels in one-hot-encoded format.

  • class_labels (list) – List of class labels which are used.

  • main_class (str) – The main discriminant class. For b-tagging obviously “bjets”.

  • frac_dict (dict) – Dict with the fraction values for the tagger.

  • var_dict_path (str) – Path to the variable dict which was used for training the tagger (to retrieve the inputs).

  • tracks_name (str) – Name of the tracks which are used in the training.

  • n_trks (int) – Number of tracks each jet needs to have. Saliency maps can only be calculated for a fixed number of tracks per jet. Only jets with this amount of tracks are used for calculation.

  • effs (list, optional) – List with the efficiencies which are tested. If None is given, the default WPs of 60, 70, 77 and 85 are tested. By default None.

  • n_jets (int, optional) – Number of jets to use to calculate the saliency maps. By default 10e4

Returns:

Map_dict – Dict with the saliency values

Return type:

dict

Raises:

ValueError – If given efficiencies are neither a list nor a int.

umami.evaluation_tools.eval_tools.get_scores_probs_dict(jets, y_true, tagger_classes: list, tagger_preds: list, tagger_names: list, tagger_list: list, class_labels: list, main_class: str, frac_values: dict, frac_values_comp: dict) dict#

Get the probabilites in a new dict and calculate the discriminant scores.

Parameters:
  • jets (pandas.DataFrame) – Dataframe with the probabilites of the comparison taggers as columns

  • y_true (numpy.ndarray) – Internal truth labeling of the used jets.

  • tagger_classes (list) – List of the classes that were used to train the freshly trained tagger. For example, if you want to test the behavior of tau jets in the tagger although the tagger was not trained on taus.

  • tagger_preds (list) – Prediction output of the taggers listed. e.g. [pred_dips, pred_umami]

  • tagger_names (list) – Names of the freshly trained taggers. e.g. [“dips”, “umami”]

  • tagger_list (list) – List of the comparison tagger names.

  • class_labels (list) – List of class labels which are used.

  • main_class (str) – The main discriminant class. For b-tagging obviously “bjets”

  • frac_values (dict) – Dict with the fraction values for the fresh taggers.

  • frac_values_comp (dict) – Dict with the fraction values for the comparison taggers.

Returns:

df_discs_dict – Dict with the discriminant scores of each jet and the probabilities of the different taggers for the used jets.

Return type:

dict

umami.evaluation_tools.eval_tools.recompute_score(df_probs, model_tagger: str, main_class: str, model_frac_values: dict, model_class_labels: list)#

Recompute the output scores of a given tagger.

Parameters:
  • df_probs (pandas.DataFrame) – Dataframe with the tagger probabilities inside.

  • model_tagger (str) – Name of the tagger to use.

  • main_class (str) – The main discriminant class. For b-tagging obviously “bjets”.

  • model_frac_values (dict) – Dict with the fraction values for the given model.

  • model_class_labels (list) – List with the class labels which are to be used.

Returns:

Scores – Array with the tagger scores for the given jets.

Return type:

numpy.ndarray

umami.evaluation_tools.feature_importance module#

Integrates shapeley package to rank feature importance in NN training.

umami.evaluation_tools.feature_importance.shapley_all_flavours(model: object, test_data: ndarray, feature_sets: int = 200, averaged_sets: int = 50, plot_size: tuple = (11, 11), plot_path: str | None = None, plot_name: str = 'shapley_all_flavors') None#

Makes a bar plot for the influence of features for all flavour outputs as categories in one plot.

averaged_sets: let’s you average over input features before they are handed to the shap framework to decrease runtime.

Parameters:
  • model (Keras Model) – Loaded model which is to be evaluated.

  • test_data (np.ndarray) – Array with the test data

  • feature_sets (int, optional) – How many whole sets of features to be calculated over. Corresponds to the number of dots per feature in the beeswarm plot , by default 200

  • averaged_sets (int, optional) – Average sets, by default 50

  • plot_size (tuple, optional) – Tuple with the plot size, by default (11, 11)

  • plot_path (str, optional) – Path where the plot is aved, by default None

  • plot_name (str, optional) – Name of the output file, by default “shapley_b-jets”

umami.evaluation_tools.feature_importance.shapley_one_flavour(model: object, test_data: ndarray, model_output: int = 2, feature_sets: int = 200, plot_size: tuple = (11, 11), plot_path: str | None = None, plot_name: str = 'shapley_b-jets') None#

https://github.com/slundberg/shap

Calculates shap values from shap package and plots results as beeswarm plot (Explainers are chosen automatically by shap depending on the feature size)

model_output: is the output node of the model like: tau_index, b_index, c_index, u_index = 3, 2, 1, 0

Parameters:
  • model (Keras Model) – Loaded model which is to be evaluated.

  • test_data (np.ndarray) – Array with the test data

  • model_output (int, optional) – How many outputs the model has, by default 2

  • feature_sets (int, optional) – How many whole sets of features to be calculated over. Corresponds to the number of dots per feature in the beeswarm plot , by default 200

  • plot_size (tuple, optional) – Tuple with the plot size, by default (11, 11)

  • plot_path (str, optional) – Path where the plot is aved, by default None

  • plot_name (str, optional) – Name of the output file, by default “shapley_b-jets”

Module contents#