# Instructions to train a Graph-Neural-Network tagger with the umami framework#

The following instructions are meant to give a guidline how to train and evaluate the Graph-Neural-Network (GNN) tagger. It is focused on the PFlow training. The repository for the tagger is here, and dedicated docs are available here

Further information on the GNN tagger is provided in the algorithms documentation here (access restricted to members of the ATLAS collaboration).

## Sample Preparation#

The first step is to obtain the samples for the training. All the samples are listed in MC-Samples.md. For the PFlow training only the ttbar and extended Z' samples from 2017 data taking period (MC16d) were used.

The training ntuples are produced using the training-dataset-dumper which dumps the jets from the PHYSVAL derivations directly into hdf5 files. The processed ntuples are also listed in the table in MC-Samples.md which can be used for training. If you want to dump your own samples, you should make sure you have the information used in the GNN config.

### Ntuple preparation#

After the previous step the ntuples need to be further processed. We can use different resampling approaches to achieve the same pt and eta distribution for all of the used flavour categories.

This processing can be done using the preprocessing capabilities of Umami via the preprocessing.py script.

For the GNN, we use the PFlow-Preprocessing-GNN.yaml config file, found here.