umami.metrics package#

Submodules#

umami.metrics.metrics module#

Script with all the metrics calculations used to grade theperformance of the taggers.

umami.metrics.metrics.calc_disc_values(jets_dict: dict, index_dict: dict, main_class: str, frac_dict: dict, rej_class: str | None = None) ndarray#

Calculate the discriminant values of the given jets for the given main class.

Parameters:
  • jets_dict (dict) – Dict with the jets inside. In each entry are the jets of one class and their output values of the NN as numpy ndarray in the shape (n_jets, n_outputs).

  • index_dict (dict) – Dict with the class names as keys and their corresponding column number in the n_outputs.

  • main_class (str) – String of the main class. “bjets” for b-tagging.

  • frac_dict (dict) – Dict with the fractions used to calculate the disc score. The values in here needs to add up to one!

  • rej_class (str) – Name of the class of jets for which the discriminant values are to be computed.

Returns:

disc_score – Array with the discriminant score values for the jets.

Return type:

numpy.ndarray

Raises:

KeyError – If for the given class label no frac_dict entry is given

Notes

The function calculates the discriminant values for the jets with the following equation:

\[D_b = \ln \left(\frac{p_b}{f_c * p_c + f_u * p_u} \right)\]

This is done here for the special case of 3 classes where bjets is the main class (signal class) and cjets and ujets are the background classes. The values \(f_c\) and \(f_u\) are taken from the frac_dict. The key is the class name, cjets for example, and the value is a float with the value of \(f_c\).

Examples

>>> jets_dict = {
...     "bjets": np.array([[0.1, 0.1, 0.8], [0.0, 0.1, 0.9]]),
...     "cjets": np.array([[0.2, 0.6, 0.2], [0.1, 0.8, 0.1]]),
...     "ujets": np.array([[0.9, 0.1, 0.0], [0.7, 0.2, 0.1]]),
... }
{'bjets': array([[0.1, 0.1, 0.8],
    [0. , 0.1, 0.9]]),
 'cjets': array([[0.2, 0.6, 0.2],
    [0.1, 0.8, 0.1]]),
 'ujets': array([[0.9, 0.1, 0. ],
    [0.7, 0.2, 0.1]])}
>>> index_dict = {"bjets": 2, "cjets": 1, "ujets": 0}
{'bjets': 2, 'cjets': 1, 'ujets': 0}
>>> main_class = "bjets"
'bjets'
>>> frac_dict = {"cjets": 0.018, "ujets": 0.982}
{'cjets': 0.018, 'ujets': 0.982}

The following will output the discriminant values for the two given bjets. Note that if no rej_class is given, the discriminant values for the main class jets are calculated.

>>> disc_score = CalcDiscValues(
...     jets_dict=jets_dict,
...     index_dict=index_dict,
...     main_class=main_class,
...     frac_dict=frac_dict,
... )
[2.07944154, 6.21460804]

Now, we can calculate the discriminant values for the cjets class.

>>> disc_score = CalcDiscValues(
...     jets_dict=jets_dict,
...     index_dict=index_dict,
...     main_class=main_class,
...     frac_dict=frac_dict,
...     rej_class"cjets",
... )
[-0.03536714, -0.11867153]
umami.metrics.metrics.discriminant_output_shape(input_shape: tuple) tuple#

Ensure the correct output shape of the discriminant.

Parameters:

input_shape (tuple) – Input shape that is used.

Returns:

shape – The shape of the first dimension of the input as tuple.

Return type:

tuple

umami.metrics.metrics.get_gradients(model: object, arr: ndarray, n_jets: int)#

Calculating the gradients with respect to the input variables. Note that only Keras backend functions can be used here because the gradients are tensorflow tensors and are not compatible with numpy.

Parameters:
  • model (object) – Loaded keras model.

  • arr (numpy.ndarray) – Track inputs of the jets.

  • n_jets (int) – Number of jets to be used.

Returns:

gradients – Gradients of the network for the given inputs.

Return type:

tensorflow.Tensor

umami.metrics.metrics.get_rejection(y_pred: ndarray, y_true: ndarray, class_labels: list, main_class: str, frac_dict: dict, target_eff: float, unique_identifier: str | None = None, subtagger: str | None = None)#

Calculates the rejections for a specific WP for all provided jets with all classes except the discriminant class (main_class). You can’t calculate the rejection for the signal class.

Parameters:
  • y_pred (numpy.ndarray) – The prediction output of the NN. This must be the shape of (n_jets, nClasses).

  • y_true (numpy.ndarray) – The true class of the jets. This must also be of the shape (n_jets, nClasses) (One-Hot-encoded).

  • class_labels (list) – A list of the class_labels which are used. This must be the same order as the truth! See the Notes for more details.

  • main_class (str) – The main discriminant class. For b-tagging obviously “bjets”.

  • frac_dict (dict) – A dict with the respective fractions for each class provided except main_class.

  • target_eff (float) – WP which is used for discriminant calculation.

  • unique_identifier (str) – Unique identifier of the used dataset (e.g. ttbar_r21)

  • subtagger (str) – String which describes the subtagger you calculate the rejection for in case you have several involved. This will add the provided string to the key in the dict, e.g. ujets_rej_<subtagger>_<file_id>

Returns:

  • Rejection_Dict (dict) – Dict of the rejections. The keys of the dict are the provided class_labels without main_class

  • cut_value (float) – Cut value that is calculated for the given working point.

Raises:
  • ValueError – If the given y_true does not match the provided class_labels.

  • ValueError – If the given shape of y_true is not supported!

Notes

The function calculates the discriminant values for the given jets with the following equation:

\[D_b = \ln \left(\frac{p_b}{f_c * p_c + f_u * p_u} \right)\]

This is done here for the special case of 3 classes where bjets is the main class (signal class) and cjets and ujets are the background classes. The values \(f_c\) and \(f_u\) are taken from the frac_dict. The key is the class name, cjets for example, and the value is a float with the value of \(f_c\).

The class_labels MUST be the same order as the one hot encoded truth. So when [0, 0, 1] is the y_true for one jet and the first column is for the ujets, the second for the cjets and the third for the bjets, then the class_labels list MUST be [“ujets”, “cjets”, “bjets”].

Examples

>>> y_pred = np.array(
...     [
...         [0.1, 0.1, 0.8],
...         [0.0, 0.1, 0.9],
...         [0.2, 0.6, 0.2],
...         [0.1, 0.8, 0.1],
...     ]
... )
array([[0.1, 0.1, 0.8],
       [0. , 0.1, 0.9],
       [0.2, 0.6, 0.2],
       [0.1, 0.8, 0.1]])
>>> y_true = np.array(
...     [
...         [0, 0, 1],
...         [0, 0, 1],
...         [0, 1, 0],
...         [0, 1, 0],
...     ]
... )
array([[0, 0, 1],
       [0, 0, 1],
       [0, 1, 0],
       [0, 1, 0]])
>>> class_labels = ["ujets", "cjets", "bjets"]
['ujets', 'cjets', 'bjets']
>>> main_class = "bjets"
'bjets'
>>> frac_dict = {"cjets": 0.018, "ujets": 0.982}
{'cjets': 0.018, 'ujets': 0.982}
>>> target_beff = 0.30
0.30

The following will output the rejection for the given jets based on their NN outputs.

>>> Rej_Dict = GetRejection(
...     y_pred=y_pred,
...     y_true=y_true,
...     class_labels=class_labels,
...     main_class=main_class,
...     frac_dict=frac_dict,
...     target_eff=target_eff,
... )
umami.metrics.metrics.get_score(y_pred: ndarray, class_labels: list, main_class: str, frac_dict: dict, use_keras_backend: bool = False) ndarray#

Similar to CalcDiscValues but uses directly the output of the NN (shape: (n_jets, nClasses)) for calculation.

Parameters:
  • y_pred (numpy.ndarray) – The prediction output of the NN.

  • class_labels (list) – A list of the class_labels which are used.

  • main_class (str) – The main discriminant class. For b-tagging obviously “bjets”.

  • frac_dict (dict) – A dict with the respective fractions for each class provided except main_class.

  • use_keras_backend (bool) – Decide, if the values are calculated with the keras backend or numpy (Keras is needed for the saliency maps).

Returns:

disc_score – Discriminant Score for the jets provided.

Return type:

numpy.ndarray

Raises:

KeyError – If for the given class label no frac_dict entry is given

Examples

>>> y_pred = np.array(
...     [
...         [0.1, 0.1, 0.8],
...         [0.0, 0.1, 0.9],
...         [0.2, 0.6, 0.2],
...         [0.1, 0.8, 0.1],
...     ]
... )
array([[0.1, 0.1, 0.8],
       [0. , 0.1, 0.9],
       [0.2, 0.6, 0.2],
       [0.1, 0.8, 0.1]])
>>> class_labels = ["ujets", "cjets", "bjets"]
['ujets', 'cjets', 'bjets']
>>> main_class = "bjets"
'bjets'
>>> frac_dict = {"cjets": 0.018, "ujets": 0.982}
{'cjets': 0.018, 'ujets': 0.982}

Now we can call the function which will return the discriminant values for the given jets based on their given NN outputs (y_pred).

>>> disc_scores = GetScore(
...     y_pred=y_pred,
...     class_labels=class_labels,
...     main_class=main_class,
...     frac_dict=frac_dict,
... )
[2.07944154, 6.21460804, -0.03536714, -0.11867153]

Module contents#