Skip to content

Start Training your Model

Start Training your Model#

After all files are preprocessed, we can start with the training. In the following sections, the global settings and the neural network settings are explained more in detail. If you want to train completely fresh, you can just adapt one of the example config files. The creation of the model folder is taken care of when we start the actual training!

Global Settings#

The global settings are the base settings needed for every training. Here we define general things like the model name and which preprocessing config file was used to produce the train dataset.

# Set modelname and path to Pflow preprocessing config file
model_name: dips_lr_0.001_bs_15000_epoch_200_nTrainJets_Full
preprocess_config: examples/preprocessing/PFlow-Preprocessing.yaml

# Add here a pretrained model to start with.
# Leave empty for a fresh start
model_file:

# Add training file
train_file: <path_place_holder>/PFlow-hybrid-preprocessed_shuffled.h5

# Defining templates for the variable cuts
.variable_cuts_ttbar: &variable_cuts_ttbar
    variable_cuts:
        - pt_btagJes:
              operator: "<="
              condition: 2.5e5

.variable_cuts_zpext: &variable_cuts_zpext
    variable_cuts:
        - pt_btagJes:
              operator: ">"
              condition: 2.5e5

# Add validation files
validation_files:
    ttbar_r21_val:
        path: <path_place_holder>/inclusive_validation_ttbar_PFlow.h5
        label: "$t\\bar{t}$ Release 21"
        <<: *variable_cuts_ttbar

    zprime_r21_val:
        path: <path_place_holder>/inclusive_validation_zprime_PFlow.h5
        label: "$Z'$ Release 21"
        <<: *variable_cuts_zpext

test_files:
    ttbar_r21:
        path: <path_place_holder>/inclusive_testing_ttbar_PFlow.h5
        <<: *variable_cuts_ttbar

    ttbar_r22:
        path: <path_place_holder>/inclusive_testing_ttbar_PFlow_r22.h5
        <<: *variable_cuts_ttbar

    zpext_r21:
        path: <path_place_holder>/inclusive_testing_zprime_PFlow.h5
        <<: *variable_cuts_zpext

    zpext_r22:
        path: <path_place_holder>/inclusive_testing_zpext_PFlow_r22.h5
        <<: *variable_cuts_zpext


exclude: null

# Tracks dataset name
tracks_name: "tracks"
Options Data Type Necessary, Optional Explanation
model_name str Necessary Name of the model you want to train. This will be the name of the folder, where all results etc. will be saved in. This folder will automatically be created if not existing.
preprocess_config str Necessary Path to the preprocessing config you used to produce the train datasets. When you start the training and the folder for the model is created, this file is copied to the metadata/ folder inside the model folder. Also, the path here in the train config will be changed to the new path of the preprocessing config inside the metadata/ folder.
model_file str Optional If you already have a model and want to use the weights of this model as start point (maybe you have a R21 trained model and now you want to use the weights of that model as initial weights for your R22 training), you can give the path to this model here. This model will be loaded and used instead of init a new one. If you don't set load_optimiser in nn_structure, the optimiser state will be resetted. If you just want to continue a specific training, use continue_training and leave this option empty.
train_file str Necessary Path to the training sample. This is given by the preprocessing step of Umami. One can also train with a TDD file format that is available after resampling but before writing step. The scaling and shifting dict will be taken from the path in preprocessing configs automatically. If you want to use the TFRecords format to train, this must be the path to the folder where the TFRecords files are saved.
continue_training bool Optional If your training died due to time constrains of jobs and you just want to continue the training from the latest point on, set this value to True and restart the training.
exclude list Necessary List of jet variables that are excluded from training. Only compatible with DL1 training!. To include all, just set this option to null. If you don't train DL1 also set this just to null.
tracks_name str Necessary Name of the tracks data-set to use for training and evaluation, default is "tracks". If you are training DL1, just remove this option.
This option is necessary when using tracks, but, when working with old preprocessed files (before January 2022, Tag 05 or older) this option has to be removed form the config file to ensure compatibility*

Two uncovered options are the validation_files and the test_files. Here we simply define the files which are later used for the validation/evaluation step. It is possible to define multiple validation files using the dict structure shown in the example. Focusing on the validation_files first, each entry needs a unique name. This name is the internal identifier for this specific file. The options defined in this entry are explained here:

Options Data Type Necessary, Optional Explanation
path str Necessary Path to the validation/test file which is to be used. Using wildcards is possible.
label str Necessary Label which is used for this file when plotting the validation plots.
variable_cuts dict Optional dict of cuts which are applied when loading the different test files. Only jet variables can be cut on. These are in this example defined as templates for the different samples types.

Focusing now on the test_files, the options are the same but no label is needed, because the plotting of the evaluation results is an extra step covered here.

Both options can also be added after the training is finished. For the training itself, you can leave this options blank.

Network Settings#

The next section in the train config is the nn_structure. Here we define all the information needed for building the model, like which tagger we want to use and also the number of hidden layer and nodes per hidden layer. The general options are shown next while the tagger dependant options are shown in their respective subsections.

nn_structure:
    # Decide, which tagger is used
    tagger: "dips"

    # NN Training parameters
    learning_rate: 0.001
    batch_size: 15000
    epochs: 200

    # Number of jets used for training
    # To use all: Fill nothing
    n_jets_train:

    # Define which classes are used for training
    # These are defined in the global_config
    class_labels: ["ujets", "cjets", "bjets"]

    # Main class which is to be tagged
    main_class: "bjets"

    # Decide if Batch Normalisation is used
    batch_normalisation: True

    # Structure of the dense layers after summing up the track outputs + respective
    # dropout rates
    dense_sizes: [100, 100, 100, 30]
    dropout_rate: [0.1, 0.1, 0.1, 0.1]

    # Options for the Learning Rate reducer
    lrr: True

    # Option if you want to use sample weights for training
    use_sample_weights: False
Options Data Type Necessary, Optional Explanation
tagger str Necessary Name of the tagger that is used/to be trained. The currently supported taggers are dips, dips_attention, cads, dl1, umami, umami and umami_cond_att. Note All version of DL1* (like DL1r or DL1d) uses the tagger dl1!
learning_rate float Necessary Learning rate which is used for training.
batch_size int Necessary Batch size which is used for training.
epochs int Necessary Number of epochs of the training.
n_jets_train int Necessary Number of jets used for training. Leave empty to use all.
class_labels list Necessary List of flavours used in training. NEEDS TO BE THE SAME AS IN THE preprocess_config. Even the ordering needs to be the same!
main_class str or list of str Necessary Main class which is to be tagged. Needs to be in class_labels. This can either be one single class (str) or multiple classes (list of str).
batch_normalisation bool Necessary Decide, if batch normalisation is used in the network. (Look in the model files where this is used for the specific models)
dense_sizes list Necessary List of nodes per layer of the network. Every entry is one layer. The numbers need to be int! For DL1r/DL1d, this is the number of nodes per layer. For DIPS/DIPS Attention/Umami/CADS this is the number of nodes per layer for the F network.
dropout_rate list List of dropout rates for the layers defined via dense_sizes. Has to be of the same length as the dense_sizes list.
load_optimiser bool Optional When loading a model (via model_file), you can load the optimiser state for continuing a training (True) or initialize a new optimiser to use the model as a start point for a fresh training (False).
use_sample_weights bool Optional Applies the weights, you calculated with the --weighting flag from the preprocessing to the training loss function.
nfiles_tfrecord int Optional Number of files that are loaded at the same time when using TFRecords for training.
lrr bool Optional Decide, if a Learning Rate Reducer (lrr) is used or not. If yes, the following options can be added.
lrr_monitor str Optional Quantity to be monitored. Default: "loss"
lrr_factor float Optional Factor by which the learning rate will be reduced. new_lr = lr * factor. Default: 0.8
lrr_patience int Optional Number of epochs with no improvement after which learning rate will be reduced. Default: 3
lrr_verbose int Optional 0: Quiet, 1: Update messages. Default: 1
lrr_mode str Optional One of {"auto", "min", "max"}. In "min" mode, the learning rate will be reduced when the quantity monitored has stopped decreasing; in "max" mode it will be reduced when the quantity monitored has stopped increasing; in "auto" mode, the direction is automatically inferred from the name of the monitored quantity. Default: "auto"
lrr_cooldown int Optional Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 5
lrr_min_learning_rate float Optional Lower bound on the learning rate. Default: 0.000001
DIPS

DIPS#

    # Structure of the dense layers for each track + respective dropout rates
    ppm_sizes: [100, 100, 128]
    dropout_rate_phi: [0, 0, 0]
Options Data Type Necessary, Optional Explanation
ppm_sizes list Necessary List of nodes per layer of the ϕ network. Every entry is one layer. The numbers need to be int!
dropout_rate_phi list Necessary List of dropout rates in the ϕ network. Has to be of the same length as the ppm_sizes list.
DL1*

DL1*#

    # Activations of the layers. Starting with first dense layer.
    activations: ["relu", "relu", "relu", "relu", "relu", "relu", "relu", "relu"]
Options Data Type Necessary, Optional Explanation
activations list Necessary List of activations per layer defined in dense_sizes. Every entry is the activation for one hidden layer. The entries must be str and activations supported by Keras.
repeat_end list Optional Experimental. List of input variables that are folded into the output of the penultimate layer. This is then feeded into the last layer.
Umami

Umami#

    # DIPS structure
    dips_ppm_units: [100, 100, 128]
    dips_dense_units: [100, 100, 100, 30]

    # These are the layers that will be concatenated with the last layer of dips_dense_units
    intermediate_units: [72]

    # DL1 structure
    dl1_units: [57, 60, 48, 36, 24, 12, 6]

    # total loss = loss(umami) + dips_loss_weight * loss(dips)
    dips_loss_weight: 1.0

    # Define which classes are used for training
    # These are defined in the global_config
    class_labels: ["ujets", "cjets", "bjets"]

    # Main class which is to be tagged
    main_class: "bjets"

    # Options for the Learning Rate reducer
    lrr: True

    # Option if you want to use sample weights for training
    use_sample_weights: False
Options Data Type Necessary, Optional Explanation
dips_ppm_units list Necessary Similar to DIPS ppm_sizes. List of nodes per layer of the ϕ network. Every entry is one layer. The numbers need to be int!
dips_dense_units list Necessary Similar to DIPS dense_sizes. List of nodes per layer of the F network. Every entry is one layer. The numbers need to be int!
intermediate_units list Necessary These are the layers that will be concatenated with the last layer of dips_dense_units. Every entry is one layer. The numbers need to be int!
dl1_units list Necessary Similar to DL1+ dense_sizes. List of nodes per layer of the DL1-like network. Every entry is one layer. The numbers need to be int!
dips_loss_weight float or int Necessary Loss weight w_{\text{DIPS}} for the DIPS loss. While training Umami, two losses are obtained: The final Umami loss and the DIPS loss. This value is the factor how important the DIPS loss is for the final model loss. \text{Loss}_{\text{Total}} = \text{Loss}_{\text{Umami}} + w_{\text{DIPS}} * \text{Loss}_{\text{DIPS}}.
DIPS Attention/CADS

DIPS Attention/CADS#

The different between the tagger here is the *_condition options. If all *_condition options are False, the tagger to use is dips_attention while if one *_condition option is True, the tagger to use is cads.

    ppm_sizes: [100, 100, 128]

    # Decide, if the pT and eta info is folded into the deep sets input
    ppm_condition: True

    # Structure of the dense layers after summing up the track outputs
    dense_sizes: [100, 100, 100, 30]

    # Decide, if the pT and eta info is folded into the F network input
    dense_condition: False

    # Number of conditions for conditional deep sets
    n_conditions: 2

    # Decide which pooling should be used
    pooling: "attention"

    # Number of attention nodes
    attention_sizes: [128, 128]

    # Decide, if the pT and eta info is folded into the attention network input
    attention_condition: True

    # Options for the Learning Rate reducer
    lrr: True

    # Option if you want to use sample weights for training
    use_sample_weights: False
Options Data Type Necessary, Optional Explanation
ppm_sizes list Necessary Similar to DIPS ppm_sizes. List of nodes per layer of the ϕ network. Every entry is one layer. The numbers need to be int!
ppm_condition bool Necessary If you want to use/fold the conditional information into the input of the ϕ network.
dense_sizes list Necessary Similar to DIPS dense_sizes. List of nodes per layer of the F network. Every entry is one layer. The numbers need to be int!
dense_condition bool Necessary If you want to use/fold the conditional information into the input of the F network.
n_conditions int Necessary Number of conditional jet input variables to use for CADS.
pooling string Necessary Pooling method that is used to pool the output of the ϕ and A networks.
attention_sizes string Necessary Similar to ppm_sizes. List of nodes per layer of the A network. Every entry is one layer. The numbers need to be int!
attention_condition bool Necessary If you want to use/fold the conditional information into the input of the A network.

Running the Training#

After the global- and network settings are prepared, you can start training your model. To start the training, switch to the umami/umami folder and run the following command:

train.py -c <path to train config file> --prepare

This command will not directly start the training, but will prepare the model folder with all needed configs/scale dicts etc. Before starting the real training, you should check all config files again. The new folder will have the name given in model_name. In there is a folder called metadata in which all configs/scale dicts etc. were copied. Also, all paths in the config are adapted to the metadata folder, like the path to the preprocessing config etc. After you checked everything, you can run the real training via the command:

train.py -c <path to train config file>

This will start the real training. Other command line arguments available for train.py are -o which will overwrite the configs/dicts in metadata if you run the training.

Note When training, the callback methods of the different taggers validate the training on the fly which can lead to memory issues. To deactivate the on the fly validation, what we recommend, just set the n_jets option in the validation_settings section of the train config to 0.

Back to top