Skip to content

Plotting Input Variables#

The input variables for different files can also be plotted using the plot_input_variables.py script. Its also steered by a yaml file. An example for such a file can be found here. The structure is close to the one from plotting_umami but still a little bit different. To start the plotting of the input variables, you need to run the following command

plot_input_vars.py -c <path/to/config> --tracks

or

plot_input_vars.py -c <path/to/config> --jets

which will plot either all plots defined using jet- or track variables. You can also give the -f or --format option where you can decide on a format for the plots. The default is pdf.

Yaml File#

In the following, the possible configration parameters are listed with a brief description.

Number of jets#

Here you can define the number of jets that are used.

Click to see corresponding code in the example config file
Eval_parameters:
  # Number of jets which are used
  n_jets: 3e4

Number of Tracks per Jet#

The number of tracks per jet can be plotted for all different files. This can be given like this:

Click to see corresponding code in the example config file
nTracks:
  variables: "tracks"
  folder_to_save: nTracks
  nTracks: True
  Datasets_to_plot:
    R21:
      files: <path_palce_holder>/user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow_looser-track_selection.2020-07-01-T193555-R26654_output.h5/*
      label: "R21 Loose"
      tracks_name: "tracks"
    R22:
      files: <path_palce_holder>/user.alfroch.410470.btagTraining.e6337_s3126_r12305_r12253_r12305_p4441.EMPFlow_loose.2021-04-20-T171733-R21211_output.h5/*
      label: "R22 Loose"
      tracks_name: "tracks"
  <<: *ttbar_cuts
  plot_settings:
    <<: *default_plot_settings
    ymin_ratio: [0.5]
    ymax_ratio: [2]
  class_labels: ["bjets", "cjets", "ujets"]
Options Data Type Necessary/Optional Explanation
nTracks_ttbar_loose str Necessary Name of the plots. This does not effect anything for the plots itself.
variables str Necessary Must be set to "tracks" for this function. Decides, which functions for plotting are used.
folder_to_save str Necessary Path where the plots should be saved. This is a relative path. Add a folder name as path.
nTracks bool Necessary MUST BE TRUE HERE! Decide if the Tracks per Jets are plotted or the input variable.
Datasets_to_plot None Necessary Here the category starts of which plots shall be plotted.
R21 None Necessary Name of the fileset which is to be plotted. Does not effect anything!
files str Necessary Path to a file which is to be used for plotting. Wildcard is supported. The function will load as much files as needed to achieve the number of jets given in the Eval_parameters.
label str Necessary Plot label for the plot legend.
tracks_name str Necessary Name of the tracks inside the h5 files you want to plot.
cut_vars_dict list Necessary A dict with cuts on the jet variables that should be applied when creating the input variable plots. Technically, this is implemented as a list of dict entries, which have as the key the name of the variable which is used for the cut (e.g. pt_btagJes) and then as sub-entries the operator used for the cut (operator) and the condition used for the cut (condition).
plot_settings dict Necessary Here starts the plot settings. See possible parameters in the section below.

Input Variables Tracks#

To plot the track input variables, the following options are used.

Click to see corresponding code in the example config file
tracks_input_vars:
  variables: "tracks"
  folder_to_save: tracks_input_vars
  Datasets_to_plot:
    R21:
      files: <path_palce_holder>/user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow_looser-track_selection.2020-07-01-T193555-R26654_output.h5/*
      label: "R21 Loose"
      tracks_name: "tracks"
    R22:
      files: <path_palce_holder>/user.alfroch.410470.btagTraining.e6337_s3126_r12305_r12253_r12305_p4441.EMPFlow_loose.2021-04-20-T171733-R21211_output.h5/*
      label: "R22 Loose"
      tracks_name: "tracks"
  plot_settings:
    <<: *default_plot_settings
    sorting_variable: "ptfrac"
    n_leading: [None, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    ymin_ratio: [0.5]
    ymax_ratio: [1.5]
  <<: *ttbar_cuts
  var_dict:
    IP3D_signed_d0_significance: 100
    IP3D_signed_z0_significance: 100
    numberOfInnermostPixelLayerHits: [0, 4, 1]
    numberOfNextToInnermostPixelLayerHits: [0, 4, 1]
    numberOfInnermostPixelLayerSharedHits: [0, 4, 1]
    numberOfInnermostPixelLayerSplitHits: [0, 4, 1]
    numberOfPixelSharedHits: [0, 4, 1]
    numberOfPixelSplitHits: [0, 9, 1]
    numberOfSCTSharedHits: [0, 4, 1]
    ptfrac: [0, 5, 0.05]
    dr: 100
    numberOfPixelHits: [0, 11, 1]
    numberOfSCTHits: [0, 19, 1]
    btagIp_d0: 100
    btagIp_z0SinTheta: 100
    number_nPix_nSCT:
      variables: ["numberOfPixelHits", "numberOfSCTHits"]
      binning: [0, 19, 1]
      operator: "+"
  class_labels: ["bjets", "cjets", "ujets"]
Options Data Type Necessary/Optional Explanation
input_vars_trks_ttbar_loose_ptfrac str Necessary Name of the plots. This does not effect anything for the plots itself.
variables str Necessary Must be set to "tracks" for this function. Decides, which functions for plotting are used.
folder_to_save str Necessary Path where the plots should be saved. This is a relative path. Add a folder name as path.
nTracks bool Necessary To plot the input variable distributions, this must be False.
Datasets_to_plot None Necessary Here the category starts of which plots shall be plotted.
R21 None Necessary Name of the fileset which is to be plotted. Does not effect anything!
files str Necessary Path to a file which is to be used for plotting. Wildcard is supported. The function will load as much files as needed to achieve the number of jets given in the Eval_parameters.
label str Necessary Plot label for the plot legend.
tracks_name str Necessary Name of the tracks inside the h5 files you want to plot.
plot_settings dict Necessary Here starts the plot settings. See possible parameters in the section below.
var_dict dict Necessary A dict with all the variables you want to plot inside. The key of the entry is the name of the variable you want to plot (how it is named in the files) and the entry itself is the binning. If you give an int, you will get your chosen number of equidistant bins. You can also give a three element list which will be used in the numpy.arange function. The first element is start, second is stop and third is number of bins. The so arranged numbers are bin edges not bins! If no value is given, the standard value is 100. If you want, for example, plot the sum of numberOfPixelHits and numberOfSCTHits, the entry needs to be a dict itself with three entries. variables, which is a list of variables you want to add up for example. operator which is the operation how to merge them. Available are "+", "-", "*" and "/". And last the binning. This is the same as explained before with the int and the list. An example is given in the config above. The variable is named number_nPix_nSCT. You can also apply the log to one variable. This can be done by defining only one variable in the dict and set the operator to "log".
cut_vars_dict list Necessary A dict with cuts on the jet variables that should be applied when creating the input variable plots. Technically, this is implemented as a list of dict entries, which have as the key the name of the variable which is used for the cut (e.g. pt_btagJes) and then as sub-entries the operator used for the cut (operator) and the condition used for the cut (condition).
xlabels dict Optional Dict with custom xlabels

Input Variables Jets#

To plot the jet input variables, the following options are used.

Click to see corresponding code in the example config file
jets_input_vars:
  variables: "jets"
  folder_to_save: jets_input_vars
  Datasets_to_plot:
    R21:
      files: <path_palce_holder>/user.mguth.410470.btagTraining.e6337_s3126_r10201_p3985.EMPFlow_looser-track_selection.2020-07-01-T193555-R26654_output.h5/*
      label: "R21 Loose"
      # class_labels can also be defined for a specific dataset (the way it is done here,
      # it doesn't change anything since it's the same as the globally defined class_labels)
      class_labels: ["bjets", "cjets", "ujets"]
    R22:
      files: <path_palce_holder>/user.alfroch.410470.btagTraining.e6337_s3126_r12305_r12253_r12305_p4441.EMPFlow_loose.2021-04-20-T171733-R21211_output.h5/*
      label: "R22 Loose"
      # If you want to specify the `class_labels` per dataset you can add it here
      # If you don't specify anything here, the overall defined `class_labels` will be
      # used
      # class_labels: ["bjets", "cjets", "ujets"]
  plot_settings:
    <<: *default_plot_settings
  class_labels: ["bjets", "cjets", "ujets"]
  <<: *ttbar_cuts
  special_param_jets:
    SV1_NGTinSvx:
      lim_left: 0
      lim_right: 19
    JetFitterSecondaryVertex_nTracks:
      lim_left: 0
      lim_right: 17
    JetFitter_nTracksAtVtx:
      lim_left: 0
      lim_right: 19
    JetFitter_nSingleTracks:
      lim_left: 0
      lim_right: 18
    JetFitter_nVTX:
      lim_left: 0
      lim_right: 6
    JetFitter_N2Tpair:
      lim_left: 0
      lim_right: 200
  xlabels:
    # here you can define xlabels, if a variable is not in this dict, the variable name
    # will be used (i.e. for pT this would be 'pt_btagJes')
    pt_btagJes: "$p_T$ [MeV]"
  var_dict:
    JetFitter_mass: 100
    JetFitter_energyFraction: 100
    JetFitter_significance3d: 100
    JetFitter_deltaR: 100
    JetFitter_nVTX: 7
    JetFitter_nSingleTracks: 19
    JetFitter_nTracksAtVtx: 20
    JetFitter_N2Tpair: 201
    JetFitter_isDefaults: 2
    JetFitterSecondaryVertex_minimumTrackRelativeEta: 11
    JetFitterSecondaryVertex_averageTrackRelativeEta: 11
    JetFitterSecondaryVertex_maximumTrackRelativeEta: 11
    JetFitterSecondaryVertex_maximumAllJetTrackRelativeEta: 11
    JetFitterSecondaryVertex_minimumAllJetTrackRelativeEta: 11
    JetFitterSecondaryVertex_averageAllJetTrackRelativeEta: 11
    JetFitterSecondaryVertex_displacement2d: 100
    JetFitterSecondaryVertex_displacement3d: 100
    JetFitterSecondaryVertex_mass: 100
    JetFitterSecondaryVertex_energy: 100
    JetFitterSecondaryVertex_energyFraction: 100
    JetFitterSecondaryVertex_isDefaults: 2
    JetFitterSecondaryVertex_nTracks: 18
    pt_btagJes: 100
    absEta_btagJes: 100
    SV1_Lxy: 100
    SV1_N2Tpair: 8
    SV1_NGTinSvx: 20
    SV1_masssvx: 100
    SV1_efracsvx: 100
    SV1_significance3d: 100
    SV1_deltaR: 10
    SV1_L3d: 100
    SV1_isDefaults: 2
    rnnip_pb: 50
    rnnip_pc: 50
    rnnip_pu: 50
    combined_rnnip:
      variables: ["rnnip_pc", "rnnip_pu"]
      binning: 50
      operator: "+"
  flavours:
    b: 5
    c: 4
    u: 0
    tau: 15
Options Data Type Necessary/Optional Explanation
input_vars_trks_ttbar_loose_ptfrac str Necessary Name of the plots. This does not effect anything for the plots itself.
variables str Necessary Must be set to "jets" for this function. Decides, which functions for plotting are used.
folder_to_save str Necessary Path where the plots should be saved. This is a relative path. Add a folder name as path.
Datasets_to_plot None Necessary Here the category starts of which plots shall be plotted.
R21 None Necessary Name of the fileset which is to be plotted. Does not effect anything!
files str Necessary Path to a file which is to be used for plotting. Wildcard is supported. The function will load as much files as needed to achieve the number of jets given in the Eval_parameters.
label str Necessary Plot label for the plot legend.
special_param_jets None Necessary Here starts the special x axis limits for a variable. If you want to set the x range by hand, add the variable here and also the lim_left for xmin and lift_right for xmax.
var_dict dict Necessary A dict with all the variables you want to plot inside. The key of the entry is the name of the variable you want to plot (how it is named in the files) and the entry itself is the binning. If you give an int, you will get your chosen number of equidistant bins. You can also give a three element list which will be used in the numpy.arange function. The first element is start, second is stop and third is number of bins. The so arranged numbers are bin edges not bins! If no value is given, the standard value is 100. If you want, for example, plot the sum of rnnip_pc and rnnip_pu, the entry needs to be a dict itself with three entries. variables, which is a list of variables you want to add up for example. operator which is the operation how to merge them. Available are "+", "-", "*" and "/". And last the binning. This is the same as explained before with the int and the list. An example is given in the config above. The variable is named combined_rnnip. You can also apply the log to one variable. This can be done by defining only one variable in the dict and set the operator to log.
cut_vars_dict list Necessary A dict with cuts on the jet variables that should be applied when creating the input variable plots. Technically, this is implemented as a list of dict entries, which have as the key the name of the variable which is used for the cut (e.g. pt_btagJes) and then as sub-entries the operator used for the cut (operator) and the condition used for the cut (condition).
plot_settings dict Necessary Here starts the plot settings. See possible parameters in the section below.
xlabels dict Optional Dict with custom xlabels

Plot settings#

The plot_settings section is similar for all three cases described above. In order to define some settings you want to apply to all plots, use yaml anchors as shown here:

Click to see corresponding code in the example config file
.default_plot_settings: &default_plot_settings
  logy: True
  use_atlas_tag: True
  atlas_first_tag: "Simulation Internal"
  atlas_second_tag: "$\\sqrt{s}$ = 13 TeV, $t\\bar{t}$ PFlow jets \n30000 jets"
  y_scale: 2
  figsize: [7, 5]

.ttbar_cuts: &ttbar_cuts
  cut_vars_dict:
    - pt_btagJes:
        operator: ">"
        condition: 2.0e4

Most of the plot settings are valid for all types of input variable plots (i.e. jet variables, track variables and the n_tracks plot). If a parameter is only valid for a certain type of plot, this is listed below.

Plot settings#

You can specify some parameters for the plots themselves. You can use the following parameters. Note that some parameters are not supported for all types of plots.

Options Plot Type Data Type Necessary/Optional Explanation
xlabels dict Optional Dict with custom xlabels
sorting_variable Track variables str Optional Variable Name to sort after.
n_leading Track variables list Optional list of the x leading tracks. If None, all tracks will be plotted. If 0 the leading tracks sorted after sorting variable will be plotted. You can add like None, 0 and 1 for example and it will plot all 3 of them, each in their own folders with according labeling. This must be a list! Even if there is only one option given.
track_origins Track variables and n_tracks plot list Optional list that gives the desired track origins when plotting.

All remaining plot settings are parameters which are handed to puma (Plotting UMami API) more specifically the HistogramPlot class. Therefore, all parameters supported by the HistogramPlot class can be specified there.

puma documentation

List of puma parameters#

Parameter Type Description
discrete_vals list, optional List of values if a variable only has discrete values. If discrete_vals is specified only the bins containing these values are plotted. By default None.
norm bool, optional Specify if the histograms are normalised, this means that histograms are divided by the total numer of counts. Therefore, the sum of the bin counts is equal to one, but NOT the area under the curve, which would be sum(bin_counts * bin_width). By default True.
logy bool, optional Set log scale on y-axis, by default False.
bin_width_in_ylabel bool, optional Specify if the bin width should be added to the ylabel, by default False
underoverflow bool, optional Option to include under- and overflow values in outermost bins.
grid bool, optional Set the grid for the plots, by default False
stacked bool, optional Decide, if all histograms (which are not data) are stacked, by default False
histtype str, optional If stacked is used, define the type of histogram you would like to have, default is "bar"
title str, optional Title of the plot, by default ""
draw_errors bool, optional Draw statistical uncertainty on the lines, by default True
xmin float, optional Minimum value of the x-axis, by default None
xmax float, optional Maximum value of the x-axis, by default None
ymin float, optional Minimum value of the y-axis, by default None
ymax float, optional Maximum value of the y-axis, by default None
ymin_ratio list, optional Set the lower y limit of each of the ratio subplots, by default None.
ymax_ratio list, optional Set the upper y limit of each of the ratio subplots, by default None.
y_scale float, optional Scaling up the y axis, e.g. to fit the ATLAS Tag. Applied if ymax not defined, by default 1.3
xlabel str, optional Label of the x-axis, by default None
ylabel str, optional Label of the y-axis, by default None
ylabel_ratio list, optional List of labels for the y-axis in the ratio plots, by default "Ratio"
label_fontsize int, optional Used fontsize in label, by default 12
fontsize int, optional Used fontsize, by default 10
n_ratio_panels int, optional Amount of ratio panels between 0 and 2, by default 0
figsize (float, float), optional Tuple of figure size (width, height) in inches, by default (8, 6)
dpi int, optional DPI used for plotting, by default 400
transparent bool, optional Specify if the background of the plot should be transparent, by default False
grid bool, optional Set the grid for the plots.
leg_fontsize int, optional Fontsize of the legend, by default 10
leg_loc str, optional Position of the legend in the plot, by default "upper right"
leg_ncol int, optional Number of legend columns, by default 1
leg_linestyle_loc str, optional Position of the linestyle legend in the plot, by default "upper center"
apply_atlas_style bool, optional Apply ATLAS style for matplotlib, by default True
use_atlas_tag bool, optional Use the ATLAS Tag in the plots, by default True
atlas_first_tag str, optional First row of the ATLAS tag (i.e. the first row is "ATLAS "), by default "Simulation Internal"
atlas_second_tag str, optional Second row of the ATLAS tag, by default ""
atlas_fontsize float, optional Fontsize of ATLAS label, by default 10
atlas_vertical_offset float, optional Vertical offset of the ATLAS tag, by default 7
atlas_horizontal_offset float, optional Horizontal offset of the ATLAS tag, by default 8
atlas_brand str, optional brand argument handed to atlasify. If you want to remove it just use an empty string or None, by default "ATLAS"
atlas_tag_outside bool, optional outside argument handed to atlasify. Decides if the ATLAS logo is plotted outside of the plot (on top), by default False
atlas_second_tag_distance float, optional Distance between the atlas_first_tag and atlas_second_tag text in units of line spacing, by default 0
Back to top