# Plotting the evaluation results#

The evaluation results can be plotted using different functions. There is the plotting_umami.py, plotting_epoch_performance and the plot_input_variables.py. Each plotting script is explained in its dedicated section.

## plotting_umami.py#

The plotting_umami.py is used to plot the results of the evaluation script. Different plots can be produced with it which are fully customizable. All plots that are defined in the plotting_umami_config_X.yaml. The X defines the tagger here but its just a name. All config files are usable with the plotting_umami.py script.

### Yaml Config File#

Important: The indentation in this .yaml is important due to the way the files are read by the script. A fully written one can be found here. The name of your freshly trained tagger, the tagger_name here in the config, is always the name of your model you have trained. The name is the value of tagger from the nn_structure.

The config file starts with the Eval_parameters. Here the Path_to_models_dir is set, where the models are saved. Also the model_name and the epoch which is to be plotted is set. A boolean parameter can be §set here to add the epoch to the end of the plot name. This is epoch_to_name. For example, this can look like this:

# Evaluation parameters
Eval_parameters:
Path_to_models_dir: <path_palce_holder>/umami/umami
model_name: dips_Loose_lr_0.001_bs_15000_epoch_200_nTrainJets_Full
epoch: 59
epoch_to_name: True


In the different available plots, there are options that are available in mostly all of them. So they will be explained next. For specific options, look at the comment in the section of the plot.

Options Explanation
Name_of_the_plot All plots start with no indentation and the name of plot. This will be the output name of the plot file and has no impact on the plot itself.
type This option specifies the plot function that is used.
data_set_name Decides which evaluated dataset (or file) is used. This data_set_name are set in the train_config yaml file which is used in the evaluation of the model. There the different files are getting their own data_set_name which needs to be the same as here!
class_labels List of class labels that were used in the preprocessing/training. They must be the same in all three files! Order is important! (Possible entries are defined in the global_config.yaml)
models_to_plot In the plots, the models which are to be plotted needs to be defined in here. You can add as many models as you want. For example this can be used to plot the results of the different taggers in one plot (e.g. for score or ROC curves). The different models can be assisted with evaluation_file to point to the results file you have created with evaluate_model.py. e.g.evaluation_file: YOURMODEL/results/results-rej_per_eff-229.h5
plot_settings In this section, all optional plotting settings are defined. They don't need to be defined but you can. For the specific available options in each function, look in the corresponding section.

In plot_settings, some general options can be set which are used in all of the available plots. These are:

Parameter Type Description
title str, optional Title of the plot, by default ""
draw_errors bool, optional Draw statistical uncertainty on the lines, by default True
xmin float, optional Minimum value of the x-axis, by default None
xmax float, optional Maximum value of the x-axis, by default None
ymin float, optional Minimum value of the y-axis, by default None
ymax float, optional Maximum value of the y-axis, by default None
ymin_ratio_1 float, optional Set the lower y limit of the first ratio subplot, by default None.
ymax_ratio_1 float, optional Set the upper y limit of the first ratio subplot, by default None.
ymin_ratio_2 float, optional Set the lower y limit of the second ratio subplot, by default None.
ymax_ratio_2 float, optional Set the upper y limit of the second ratio subplot, by default None.
y_scale float, optional Scaling up the y axis, e.g. to fit the ATLAS Tag. Applied if ymax not defined, by default 1.3
xlabel str, optional Label of the x-axis, by default None
ylabel str, optional Label of the y-axis, by default None
ylabel_ratio_1 str, optional Label of the y-axis in the first ratio plot, by default "Ratio"
ylabel_ratio_2 str, optional Label of the y-axis in the second ratio plot, by default "Ratio"
label_fontsize int, optional Used fontsize in label, by default 12
fontsize int, optional Used fontsize, by default 10
n_ratio_panels int, optional Amount of ratio panels between 0 and 2, by default 0
figsize (float, float), optional Tuple of figure size (width, height) in inches, by default (8, 6)
dpi int, optional DPI used for plotting, by default 400
transparent bool, optional Specify if the background of the plot should be transparent, by default False
grid bool, optional Set the grid for the plots.
leg_fontsize int, optional Fontsize of the legend, by default 10
leg_loc str, optional Position of the legend in the plot, by default "upper right"
leg_ncol int, optional Number of legend columns, by default 1
leg_linestyle_loc str, optional Position of the linestyle legend in the plot, by default "upper center"
apply_atlas_style bool, optional Apply ATLAS style for matplotlib, by default True
use_atlas_tag bool, optional Use the ATLAS Tag in the plots, by default True
atlas_first_tag str, optional First row of the ATLAS tag (i.e. the first row is "ATLAS "), by default "Simulation Internal"
atlas_second_tag str, optional Second row of the ATLAS tag, by default ""
atlas_fontsize float, optional Fontsize of ATLAS label, by default 10
atlas_vertical_offset float, optional Vertical offset of the ATLAS tag, by default 7
atlas_horizontal_offset float, optional Horizontal offset of the ATLAS tag, by default 8
atlas_brand str, optional brand argument handed to atlasify. If you want to remove it just use an empty string or None, by default "ATLAS"
atlas_tag_outside bool, optional outside argument handed to atlasify. Decides if the ATLAS logo is plotted outside of the plot (on top), by default False
atlas_second_tag_distance float, optional Distance between the atlas_first_tag and atlas_second_tag text in units of line spacing, by default 0

For plotting, these different plots are available:

#### Confusion Matrix#

Plot a confusion matrix. For example:

confusion_matrix_Dips_ttbar:
type: "confusion_matrix"
data_set_name: "ttbar_r21"
tagger_name: "dips"
class_labels: ["ujets", "cjets", "bjets"]
plot_settings:
colorbar: True

Options Data Type Necessary/Optional Explanation
colourbar bool Optional Define, if the colourbar on the side is shown or not.

#### Probability#

Plotting the DNN probability output for a specific class. For example:

Dips_prob_pb:
type: "probability"
prob_class: "bjets"
models_to_plot:
dips_r22:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
class_labels: ["ujets", "cjets", "bjets"]
plot_settings:
logy: True
bins: 50
y_scale: 1.5 # Increasing of the y axis so the plots dont collide with labels (mainly atlas_first_tag)
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ test sample"

Options Data Type Necessary/Optional Explanation
type str Necessary This gives the type of plot function used. Must be "probability" here.
prob_class str Necessary Class of the to be plotted probability.
dips_r22 None Necessary Internal naming of the model. This will not show up anywhere, but it must be unique! You can define multiple of these models. All of them will be plotted. The baseline for the ratio is the first model defined here.
data_set_name str Necessary Name of the dataset that is used. This is the name of the test_file which you want to use.
label str Necessary Legend label of the model.
tagger_name str Necessary Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger.
class_labels list Necessary List of class labels that were used in the preprocessing/training. They must be the same in all three files! Order is important!

#### Scores#

Plotting the b-tagging discriminant scores for the different jet flavors. For example:

scores_Dips_ttbar:
type: "scores"
main_class: "bjets"
models_to_plot:
dips_r21:
data_set_name: "ttbar_r21"
tagger_name: "dips"
class_labels: ["ujets", "cjets", "bjets"]
label: "$t\\bar{t}$"
plot_settings:
working_points: [0.60, 0.70, 0.77, 0.85] # Set Working Point Lines in plot
bins: 50 # Number of bins
y_scale: 1.3 # Increasing of the y axis so the plots dont collide with labels (mainly atlas_first_tag)
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ test sample"

Options Data Type Necessary/Optional Explanation
type str Necessary This gives the type of plot function used. Must be "scores" here.
main_class str Class which is to be tagged.
dips_r21 None Necessary Internal naming of the model. This will not show up anywhere, but it must be unique! You can define multiple of these models. All of them will be plotted. The baseline for the ratio is the first model defined here.
data_set_name str Necessary Name of the dataset that is used. This is the name of the test_file which you want to use.
tagger_name str Necessary Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger.
class_labels list Necessary List of class labels that were used in the preprocessing/training. They must be the same in all three files! Order is important!
label str Necessary Legend label of the model.
working_points list Optional The specified WPs are calculated and at the calculated b-tagging discriminant there will be a vertical line with a small label on top which prints the WP.

#### ROC Curves#

Plotting the ROC Curves of the rejection rates against the b-tagging efficiency. For example:

Dips_light_flavour_ttbar:
type: "ROC"
models_to_plot:
dips_r21_u:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
rejection_class: "ujets"
plot_settings:
draw_errors: True
xmin: 0.5
ymax: 1000000
figsize: [7, 6] # [width, hight]
working_points: [0.60, 0.70, 0.77, 0.85]
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ validation sample, fc=0.018"

Options Data Type Necessary/Optional Explanation
type str Necessary This gives the type of plot function used. Must be "ROC" here.
main_class str Class which is to be tagged.
dips_r21 None Necessary Internal naming of the model. This will not show up anywhere, but it must be unique! You can define multiple of these models. All of them will be plotted. The baseline for the ratio is the first model defined here.
data_set_name str Necessary Name of the dataset that is used. This is the name of the test_file which you want to use.
label str Necessary Legend label of the model.
tagger_name str Necessary Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger.
rejection_class str Necessary Class which the main flavour is plotted against.
draw_errors bool Optional Plot binomial errors to plot.
xmin float Optional Set the minimum b efficiency in the plot (which is the xmin limit).
ymax float Optional The maximum y axis.
working_points list Optional The specified WPs are calculated and at the calculated b-tagging discriminant there will be a vertical line with a small label on top which prints the WP.

You can plot two rejections at the same time with two subplots with the ratios. One for each rejection. An example for this can be seen here:

Dips_Comparison_flavour_ttbar:
type: "ROC"
models_to_plot:
dips_r21_u:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
rejection_class: "ujets"
dips_r21_c:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
rejection_class: "cjets"
plot_settings:
draw_errors: True
xmin: 0.5
ymax: 1000000
figsize: [9, 9] # [width, hight]
working_points: [0.60, 0.70, 0.77, 0.85]
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ validation sample, fc=0.018"


#### Variable vs Efficiency#

Plot the b efficiency/c-rejection/light-rejection against the pT. For example:

Dips_pT_vs_beff:
type: "pT_vs_eff"
models_to_plot:
dips:
data_set_name: "ttbar_r21"
label: "DIPS"
tagger_name: "dips"
plot_settings:
bin_edges: [0, 20, 30, 40, 60, 85, 110, 140, 175, 250, 400, 1000]
flavour: "cjets"
variable: "pt"
class_labels: ["ujets", "cjets", "bjets"]
main_class: "bjets"
working_point: 0.77
working_point_line: True
fixed_eff_bin: False
figsize: [7, 5]
logy: False
use_atlas_tag: True
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ test sample"
y_scale: 1.3

Options Data Type Necessary/Optional Explanation
type str Necessary This gives the type of plot function used. Must be "var_vs_eff" here.
dips None Necessary Internal naming of the model. This will not show up anywhere, but it must be unique! You can define multiple of these models. All of them will be plotted. The baseline for the ratio is the first model defined here.
data_set_name str Necessary Name of the dataset that is used. This is the name of the test_file which you want to use.
label str Necessary Legend label of the model.
tagger_name str Necessary Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger.
bin_edges list Necessary Setting the edges of the bins. Don't forget the first/last edge!
flavour str Necessary Flavour class rejection which is to be plotted.
variable str Necessary Variable against the efficiency/rejection is plotted against.
class_labels list Necessary List of class labels that were used in the preprocessing/training. They must be the same in all three files! Order is important!
main_class str Necessary Class which is to be tagged.
working_point float Necessary Float of the working point that will be used.
working_point_line float Optional Print a horizontal line at this value efficiency.
fixed_eff_bin bool Optional Calculate the WP cut on the discriminant per bin.

#### Saliency Plots#

To evaluate the impact of the track variables to the final b-tagging discriminant can't be found using SHAPley. To make the impact visible (for each track of the jet), so-called Saliency maps are used. These maps are calculated when evaluating the model you have trained (if it is activated). A lot of different options can be set. An example is given here:

Dips_saliency_b_WP77_passed_ttbar:
type: "saliency"
data_set_name: "ttbar_r21"
target_eff: 0.77
jet_flavour: "bjets"
PassBool: True
nFixedTrks: 8
plot_settings:
title: "Saliency map for $b$ jets from \n $t\\bar{t}$ who passed WP = 77% \n with exactly 8 tracks"
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets"

Options Data Type Necessary/Optional Explanation
type str Necessary This gives the type of plot function used. Must be "saliency" here.
data_set_name str Necessary Name of the dataset that is used. This is the name of the test_file which you want to use.
target_eff float Necessary Efficiency of the target flavour you want to use (Which WP you want to use). The value is given between 0 and 1.
jet_flavour str Necessary Name of flavour you want to plot.
PassBool str Necessary Decide if the jets need to pass the working point discriminant cut or not. False would give you, for example, truth b-jets which does not pass the working point discriminant cut and are therefore not tagged a b-jets.
nFixedTrks int Necessary The saliency maps can only be calculated for jets with a fixed number of tracks. This number of tracks can be set with this parameter. For example, if this value is 8, than only jets which have exactly 8 tracks are used for the saliency maps. This value needs to be set in the train config when you run the evaluation! If you run the evaluation with, for example 5, you can't plot the saliency map for 8.

#### Fraction Contour Plot#

Plot two rejections against each other for a given working point with different fraction values.

contour_fraction_ttbar:
type: "fraction_contour"
rejections: ["ujets", "cjets"]
models_to_plot:
dips:
tagger_name: "dips"
colour: "b"
linestyle: "--"
label: "DIPS"
data_set_name: "ttbar_r21"
marker:
cjets: 0.1
ujets: 0.9
marker_style: "x"
rnnip:
tagger_name: "rnnip"
colour: "r"
linestyle: "--"
label: "RNNIP"
data_set_name: "ttbar_r21"
plot_settings:
y_scale: 1.3 # Increasing of the y axis so the plots dont collide with labels (mainly atlas_first_tag)
use_atlas_tag: True # Enable/Disable atlas_first_tag
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}=13$ TeV, PFlow jets,\n$t\\bar{t}$ test sample, WP = 77 %"

Options Data Type Necessary/Optional Explanation
rejections list Necessary List with two items. These are the rejections that are plotted against each other. Only background classes can be plotted like this.
tagger_name str Necessary Name of the tagger which is to be plotted. This is the name of the tagger either from the .h5 files or your freshly trained tagger (look here for an explanation of the freshly trained tagger names). **IMPORTANT: If you want to use a tagger from the .h5 files, you must run the evaluate_model.py script with the names of taggers in the train config in the evaluation_settings section. There you need to enter the name to the tagger list and the fraction values to the frac_values_comp dict. The key is the name of the tagger.
colour str Optional Give a specific colour to the tagger.
linestyle str Optional Give a specific linestyle to the tagger.
label str Necessary Give a label for the tagger that will be printed to the legend.
data_set_name str Necessary The dataset to use from the dataframe as specified in evaluation.
marker dict Optional You can set a marker (a x or something like that) at a certain fraction combination if you want to. All important information for that are added here.
rejection float Necessary (if marker is used) Give two fraction values for your selected rejections. This is the position where the marker will be plotted. In the example, this is cjets and ujets.
marker_style str Optional Give a marker style that is used for the marker. Default is "x".
marker_label str Optional Give a custom marker legend label. Default is the tagger label + the fraction values.
markersize int Optional Size of the marker. Default is 15.
markeredgewidth int Optional Size of the lines of the marker. Default is 2.

### Executing the Script#

The script can be executed by using the following command:

plotting_umami.py -c ${EXAMPLES}/plotting_umami_config_dips.yaml -o dips_eval_plots  The -o option defines the name of the output directory. It will be added to the model folder where also the results are saved. Also you can set the output filetype by using the -f option. For example: plotting_umami.py -c${EXAMPLES}/plotting_umami_config_dips.yaml -o dips_eval_plots -f png


The output plots will be .png now. Standard is pdf.