# Plotting Input Variables#

The input variables for different files can also be plotted using the plot_input_variables.py script. Its also steered by a yaml file. An example for such a file can be found here. The structure is close to the one from plotting_umami but still a little bit different. To start the plotting of the input variables, you need to run the following command

plot_input_vars.py -c <path/to/config> --tracks


or

plot_input_vars.py -c <path/to/config> --jets


which will plot either all plots defined using jet- or track variables. You can also give the -f or --format option where you can decide on a format for the plots. The default is pdf.

### Yaml File#

In the following, the possible configration parameters are listed with a brief description.

#### Number of jets#

Here you can define the number of jets that are used.

Click to see corresponding code in the example config file
Eval_parameters:
# Number of jets which are used
n_jets: 3e4


#### Number of Tracks per Jet#

The number of tracks per jet can be plotted for all different files. This can be given like this:

Click to see corresponding code in the example config file
nTracks:
variables: "tracks"
folder_to_save: nTracks
nTracks: True
Datasets_to_plot:
R21:
files: <path_palce_holder>/user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow_looser-track_selection.2020-07-01-T193555-R26654_output.h5/*
label: "R21 Loose"
tracks_name: "tracks"
R22:
files: <path_palce_holder>/user.alfroch.410470.btagTraining.e6337_s3126_r12305_r12253_r12305_p4441.EMPFlow_loose.2021-04-20-T171733-R21211_output.h5/*
label: "R22 Loose"
tracks_name: "tracks"
<<: *ttbar_cuts
plot_settings:
<<: *default_plot_settings
ymin_ratio_1: 0.5
ymax_ratio_1: 2
class_labels: ["bjets", "cjets", "ujets"]

Options Data Type Necessary/Optional Explanation
nTracks_ttbar_loose str Necessary Name of the plots. This does not effect anything for the plots itself.
variables str Necessary Must be set to "tracks" for this function. Decides, which functions for plotting are used.
folder_to_save str Necessary Path where the plots should be saved. This is a relative path. Add a folder name as path.
nTracks bool Necessary MUST BE TRUE HERE! Decide if the Tracks per Jets are plotted or the input variable.
Datasets_to_plot None Necessary Here the category starts of which plots shall be plotted.
R21 None Necessary Name of the fileset which is to be plotted. Does not effect anything!
files str Necessary Path to a file which is to be used for plotting. Wildcard is supported. The function will load as much files as needed to achieve the number of jets given in the Eval_parameters.
label str Necessary Plot label for the plot legend.
tracks_name str Necessary Name of the tracks inside the h5 files you want to plot.
cut_vars_dict list Necessary A dict with cuts on the jet variables that should be applied when creating the input variable plots. Technically, this is implemented as a list of dict entries, which have as the key the name of the variable which is used for the cut (e.g. pt_btagJes) and then as sub-entries the operator used for the cut (operator) and the condition used for the cut (condition).
plot_settings dict Necessary Here starts the plot settings. See possible parameters in the section below.

#### Input Variables Tracks#

To plot the track input variables, the following options are used.

Click to see corresponding code in the example config file
tracks_input_vars:
variables: "tracks"
folder_to_save: tracks_input_vars
Datasets_to_plot:
R21:
files: <path_palce_holder>/user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow_looser-track_selection.2020-07-01-T193555-R26654_output.h5/*
label: "R21 Loose"
tracks_name: "tracks"
R22:
files: <path_palce_holder>/user.alfroch.410470.btagTraining.e6337_s3126_r12305_r12253_r12305_p4441.EMPFlow_loose.2021-04-20-T171733-R21211_output.h5/*
label: "R22 Loose"
tracks_name: "tracks"
plot_settings:
<<: *default_plot_settings
sorting_variable: "ptfrac"
n_leading: [None, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
ymin_ratio_1: 0.5
ymax_ratio_1: 1.5
<<: *ttbar_cuts
var_dict:
IP3D_signed_d0_significance: 100
IP3D_signed_z0_significance: 100
numberOfInnermostPixelLayerHits: [0, 4, 1]
numberOfNextToInnermostPixelLayerHits: [0, 4, 1]
numberOfInnermostPixelLayerSharedHits: [0, 4, 1]
numberOfInnermostPixelLayerSplitHits: [0, 4, 1]
numberOfPixelSharedHits: [0, 4, 1]
numberOfPixelSplitHits: [0, 9, 1]
numberOfSCTSharedHits: [0, 4, 1]
ptfrac: [0, 5, 0.05]
dr: 100
numberOfPixelHits: [0, 11, 1]
numberOfSCTHits: [0, 19, 1]
btagIp_d0: 100
btagIp_z0SinTheta: 100
number_nPix_nSCT:
variables: ["numberOfPixelHits", "numberOfSCTHits"]
binning: [0, 19, 1]
operator: "+"
class_labels: ["bjets", "cjets", "ujets"]

Options Data Type Necessary/Optional Explanation
input_vars_trks_ttbar_loose_ptfrac str Necessary Name of the plots. This does not effect anything for the plots itself.
variables str Necessary Must be set to "tracks" for this function. Decides, which functions for plotting are used.
folder_to_save str Necessary Path where the plots should be saved. This is a relative path. Add a folder name as path.
nTracks bool Necessary To plot the input variable distributions, this must be False.
Datasets_to_plot None Necessary Here the category starts of which plots shall be plotted.
R21 None Necessary Name of the fileset which is to be plotted. Does not effect anything!
files str Necessary Path to a file which is to be used for plotting. Wildcard is supported. The function will load as much files as needed to achieve the number of jets given in the Eval_parameters.
label str Necessary Plot label for the plot legend.
tracks_name str Necessary Name of the tracks inside the h5 files you want to plot.
plot_settings dict Necessary Here starts the plot settings. See possible parameters in the section below.
var_dict dict Necessary A dict with all the variables you want to plot inside. The key of the entry is the name of the variable you want to plot (how it is named in the files) and the entry itself is the binning. If you give an int, you will get your chosen number of equidistant bins. You can also give a three element list which will be used in the numpy.arange function. The first element is start, second is stop and third is number of bins. The so arranged numbers are bin edges not bins! If no value is given, the standard value is 100. If you want, for example, plot the sum of numberOfPixelHits and numberOfSCTHits, the entry needs to be a dict itself with three entries. variables, which is a list of variables you want to add up for example. operator which is the operation how to merge them. Available are "+", "-", "*" and "/". And last the binning. This is the same as explained before with the int and the list. An example is given in the config above. The variable is named number_nPix_nSCT. You can also apply the log to one variable. This can be done by defining only one variable in the dict and set the operator to "log".
cut_vars_dict list Necessary A dict with cuts on the jet variables that should be applied when creating the input variable plots. Technically, this is implemented as a list of dict entries, which have as the key the name of the variable which is used for the cut (e.g. pt_btagJes) and then as sub-entries the operator used for the cut (operator) and the condition used for the cut (condition).
xlabels dict Optional Dict with custom xlabels

#### Input Variables Jets#

To plot the jet input variables, the following options are used.

Click to see corresponding code in the example config file
jets_input_vars:
variables: "jets"
folder_to_save: jets_input_vars
Datasets_to_plot:
R21:
files: <path_palce_holder>/user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow_looser-track_selection.2020-07-01-T193555-R26654_output.h5/*
label: "R21 Loose"
# class_labels can also be defined for a specific dataset (the way it is done here,
# it doesn't change anything since it's the same as the globally defined class_labels)
class_labels: ["bjets", "cjets", "ujets"]
R22:
files: <path_palce_holder>/user.alfroch.410470.btagTraining.e6337_s3126_r12305_r12253_r12305_p4441.EMPFlow_loose.2021-04-20-T171733-R21211_output.h5/*
label: "R22 Loose"
# If you want to specify the class_labels per dataset you can add it here
# If you don't specify anything here, the overall defined class_labels will be
# used
# class_labels: ["bjets", "cjets", "ujets"]
plot_settings:
<<: *default_plot_settings
class_labels: ["bjets", "cjets", "ujets"]
<<: *ttbar_cuts
special_param_jets:
SV1_NGTinSvx:
lim_left: 0
lim_right: 19
JetFitterSecondaryVertex_nTracks:
lim_left: 0
lim_right: 17
JetFitter_nTracksAtVtx:
lim_left: 0
lim_right: 19
JetFitter_nSingleTracks:
lim_left: 0
lim_right: 18
JetFitter_nVTX:
lim_left: 0
lim_right: 6
JetFitter_N2Tpair:
lim_left: 0
lim_right: 200
xlabels:
# here you can define xlabels, if a variable is not in this dict, the variable name
# will be used (i.e. for pT this would be 'pt_btagJes')
pt_btagJes: "$p_T$ [MeV]"
var_dict:
JetFitter_mass: 100
JetFitter_energyFraction: 100
JetFitter_significance3d: 100
JetFitter_deltaR: 100
JetFitter_nVTX: 7
JetFitter_nSingleTracks: 19
JetFitter_nTracksAtVtx: 20
JetFitter_N2Tpair: 201
JetFitter_isDefaults: 2
JetFitterSecondaryVertex_minimumTrackRelativeEta: 11
JetFitterSecondaryVertex_averageTrackRelativeEta: 11
JetFitterSecondaryVertex_maximumTrackRelativeEta: 11
JetFitterSecondaryVertex_maximumAllJetTrackRelativeEta: 11
JetFitterSecondaryVertex_minimumAllJetTrackRelativeEta: 11
JetFitterSecondaryVertex_averageAllJetTrackRelativeEta: 11
JetFitterSecondaryVertex_displacement2d: 100
JetFitterSecondaryVertex_displacement3d: 100
JetFitterSecondaryVertex_mass: 100
JetFitterSecondaryVertex_energy: 100
JetFitterSecondaryVertex_energyFraction: 100
JetFitterSecondaryVertex_isDefaults: 2
JetFitterSecondaryVertex_nTracks: 18
pt_btagJes: 100
absEta_btagJes: 100
SV1_Lxy: 100
SV1_N2Tpair: 8
SV1_NGTinSvx: 20
SV1_masssvx: 100
SV1_efracsvx: 100
SV1_significance3d: 100
SV1_deltaR: 10
SV1_L3d: 100
SV1_isDefaults: 2
rnnip_pb: 50
rnnip_pc: 50
rnnip_pu: 50
combined_rnnip:
variables: ["rnnip_pc", "rnnip_pu"]
binning: 50
operator: "+"
flavours:
b: 5
c: 4
u: 0
tau: 15

Options Data Type Necessary/Optional Explanation
input_vars_trks_ttbar_loose_ptfrac str Necessary Name of the plots. This does not effect anything for the plots itself.
variables str Necessary Must be set to "jets" for this function. Decides, which functions for plotting are used.
folder_to_save str Necessary Path where the plots should be saved. This is a relative path. Add a folder name as path.
Datasets_to_plot None Necessary Here the category starts of which plots shall be plotted.
R21 None Necessary Name of the fileset which is to be plotted. Does not effect anything!
files str Necessary Path to a file which is to be used for plotting. Wildcard is supported. The function will load as much files as needed to achieve the number of jets given in the Eval_parameters.
label str Necessary Plot label for the plot legend.
special_param_jets None Necessary Here starts the special x axis limits for a variable. If you want to set the x range by hand, add the variable here and also the lim_left for xmin and lift_right for xmax.
var_dict dict Necessary A dict with all the variables you want to plot inside. The key of the entry is the name of the variable you want to plot (how it is named in the files) and the entry itself is the binning. If you give an int, you will get your chosen number of equidistant bins. You can also give a three element list which will be used in the numpy.arange function. The first element is start, second is stop and third is number of bins. The so arranged numbers are bin edges not bins! If no value is given, the standard value is 100. If you want, for example, plot the sum of rnnip_pc and rnnip_pu, the entry needs to be a dict itself with three entries. variables, which is a list of variables you want to add up for example. operator which is the operation how to merge them. Available are "+", "-", "*" and "/". And last the binning. This is the same as explained before with the int and the list. An example is given in the config above. The variable is named combined_rnnip. You can also apply the log to one variable. This can be done by defining only one variable in the dict and set the operator to log.
cut_vars_dict list Necessary A dict with cuts on the jet variables that should be applied when creating the input variable plots. Technically, this is implemented as a list of dict entries, which have as the key the name of the variable which is used for the cut (e.g. pt_btagJes) and then as sub-entries the operator used for the cut (operator) and the condition used for the cut (condition).
plot_settings dict Necessary Here starts the plot settings. See possible parameters in the section below.
xlabels dict Optional Dict with custom xlabels

#### Plot settings#

The plot_settings section is similar for all three cases described above. In order to define some settings you want to apply to all plots, use yaml anchors as shown here:

Click to see corresponding code in the example config file
.default_plot_settings: &default_plot_settings
logy: True
use_atlas_tag: True
atlas_first_tag: "Simulation Internal"
atlas_second_tag: "$\\sqrt{s}$ = 13 TeV, $t\\bar{t}$ PFlow jets \n30000 jets"
y_scale: 2
figsize: [7, 5]

.ttbar_cuts: &ttbar_cuts
cut_vars_dict:
- pt_btagJes:
operator: ">"
condition: 2.0e4


Most of the plot settings are valid for all types of input variable plots (i.e. jet variables, track variables and the n_tracks plot). If a parameter is only valid for a certain type of plot, this is listed below.

## Plot settings#

You can specify some parameters for the plots themselves. You can use the following parameters. Note that some parameters are not supported for all types of plots.

Options Plot Type Data Type Necessary/Optional Explanation
xlabels dict Optional Dict with custom xlabels
sorting_variable Track variables str Optional Variable Name to sort after.
n_leading Track variables list Optional list of the x leading tracks. If None, all tracks will be plotted. If 0 the leading tracks sorted after sorting variable will be plotted. You can add like None, 0 and 1 for example and it will plot all 3 of them, each in their own folders with according labeling. This must be a list! Even if there is only one option given.
track_origins Track variables and n_tracks plot list Optional list that gives the desired track origins when plotting.

All remaining plot settings are parameters which are handed to puma (Plotting UMami API) more specifically the HistogramPlot class. Therefore, all parameters supported by the HistogramPlot class can be specified there.

puma documentation

### List of puma parameters#

Parameter Type Description
discrete_vals list, optional List of values if a variable only has discrete values. If discrete_vals is specified only the bins containing these values are plotted. By default None.
norm bool, optional Specify if the histograms are normalised, this means that histograms are divided by the total numer of counts. Therefore, the sum of the bin counts is equal to one, but NOT the area under the curve, which would be sum(bin_counts * bin_width). By default True.
logy bool, optional Set log scale on y-axis, by default False.
bin_width_in_ylabel bool, optional Specify if the bin width should be added to the ylabel, by default False
underoverflow bool, optional Option to include under- and overflow values in outermost bins.
title str, optional Title of the plot, by default ""
draw_errors bool, optional Draw statistical uncertainty on the lines, by default True
xmin float, optional Minimum value of the x-axis, by default None
xmax float, optional Maximum value of the x-axis, by default None
ymin float, optional Minimum value of the y-axis, by default None
ymax float, optional Maximum value of the y-axis, by default None
ymin_ratio_1 float, optional Set the lower y limit of the first ratio subplot, by default None.
ymax_ratio_1 float, optional Set the upper y limit of the first ratio subplot, by default None.
ymin_ratio_2 float, optional Set the lower y limit of the second ratio subplot, by default None.
ymax_ratio_2 float, optional Set the upper y limit of the second ratio subplot, by default None.
y_scale float, optional Scaling up the y axis, e.g. to fit the ATLAS Tag. Applied if ymax not defined, by default 1.3
xlabel str, optional Label of the x-axis, by default None
ylabel str, optional Label of the y-axis, by default None
ylabel_ratio_1 str, optional Label of the y-axis in the first ratio plot, by default "Ratio"
ylabel_ratio_2 str, optional Label of the y-axis in the second ratio plot, by default "Ratio"
label_fontsize int, optional Used fontsize in label, by default 12
fontsize int, optional Used fontsize, by default 10
n_ratio_panels int, optional Amount of ratio panels between 0 and 2, by default 0
figsize (float, float), optional Tuple of figure size (width, height) in inches, by default (8, 6)
dpi int, optional DPI used for plotting, by default 400
transparent bool, optional Specify if the background of the plot should be transparent, by default False
grid bool, optional Set the grid for the plots.
leg_fontsize int, optional Fontsize of the legend, by default 10
leg_loc str, optional Position of the legend in the plot, by default "upper right"
leg_ncol int, optional Number of legend columns, by default 1
leg_linestyle_loc str, optional Position of the linestyle legend in the plot, by default "upper center"
apply_atlas_style bool, optional Apply ATLAS style for matplotlib, by default True
use_atlas_tag bool, optional Use the ATLAS Tag in the plots, by default True
atlas_first_tag str, optional First row of the ATLAS tag (i.e. the first row is "ATLAS "), by default "Simulation Internal"
atlas_second_tag str, optional Second row of the ATLAS tag, by default ""
atlas_fontsize float, optional Fontsize of ATLAS label, by default 10
atlas_vertical_offset float, optional Vertical offset of the ATLAS tag, by default 7
atlas_horizontal_offset float, optional Horizontal offset of the ATLAS tag, by default 8
atlas_brand str, optional brand argument handed to atlasify. If you want to remove it just use an empty string or None, by default "ATLAS"
atlas_tag_outside bool, optional outside argument handed to atlasify. Decides if the ATLAS logo is plotted outside of the plot (on top), by default False
atlas_second_tag_distance float, optional Distance between the atlas_first_tag and atlas_second_tag text in units of line spacing, by default 0