DAS  3.0
Das Analysis System
Producing n-tuples

This package contains the generic configuration and commands to generate n-tuples. The n-tuple format is described in ../Objects/README.md "Objects". To run the n-tupliser, the input datasets must be stored on Tier2 servers (T2_*) in MINIAOD format.

We first provide technical details on the datasets, then on the concept of campaign, finally on the running of the n-tupliser itself.

Data sets

Certificate

As soon as you want to deal with CMS datasets, you need a valid grid certificate. You should check with your computing admin in your institute how to obtain a valid grid certificate.

Then use the following command to activate your grid certificate:

voms-proxy-init -voms cms

By default, your certificate will be valid for 24h; to run longer, use -rfc -valid 192:00. To check if any certificate has already been activated:

voms-proxy-info

Note: activating your grid is only necessary to run commands that deal with CMS datasets: once the n-tuples are being or have been produced, you don't need it anymore. In other words: there is no point sourcing it each time you source the environment for daily analysis.

Browsing

A nice tool to find Monte Carlo datasets is the Grasp website. It has an intuitive search interface that lets you filter by data-taking era, dataset size, or MiniAOD version. The full path of a dataset can be found by clicking on the "DAS" link in the results page (note that in this context, DAS means "Data Aggregation System").

To work from the shell, one standard command is dasgoclient, reachable in any CMSSW environment (if you are running from a EL9 node, you need to prefix all upcoming commands with das-cmssw el8 or similar). The most basic type of call goes as follows:

$ dasgoclient -query "/JetHT/*/MINIAOD" # on EL8
$ das-cmssw el8 dasgoclient -query "/JetHT/*/MINIAOD" # on EL9

to see all existing JetHT datasets in MiniAOD (beware: the list may be quite long). More advanced commands may be run for example to investigate to location of datasets; for instance:

$ dasgoclient -query "file dataset=/JetHT/Run2016G-07Aug17-v1/AOD"
Showing 1-10 out of 36115 results, for more results use --idx/--limit options
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/80D2B128-EE8C-E711-BBB7-001E673972AB.root
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/80CAE826-EE8C-E711-9733-001E677925A0.root
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/80A9E9AE-A88D-E711-8D2E-001E67F67372.root
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/807C7D29-818D-E711-9472-001E67792514.root
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/8031D892-268D-E711-90D9-002590200934.root
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/802D756F-298D-E711-B242-002590200984.root
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/7EECEA90-348D-E711-B111-002590200934.root
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/7EC9C4E9-C98D-E711-B761-002590200B34.root
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/7E5EC79B-998D-E711-A8D3-001E67397D05.root
/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50000/7E58EC0D-598D-E711-9CA4-002590200868.root

or

$ dasgoclient -query "file dataset=/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/RunIISummer16DR80Premix-PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/AODSIM"
Showing 1-10 out of 3436 results, for more results use --idx/--limit options
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FED18A1B-21AD-E611-8736-0CC47A7C3572.root
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FEC35242-63AD-E611-B8B9-0025905B8612.root
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FE951C01-94AE-E611-9266-0CC47A4C8E2A.root
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FE715E49-9CAE-E611-BFBF-0025905A60D2.root
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FE6C3D21-A0AD-E611-AD0A-0025905B8574.root
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FE40B6C4-0FAD-E611-ACCC-0025905A605E.root
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FE26834B-A2AE-E611-B24F-0CC47A7C351E.root
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FE0FAD2B-A0AE-E611-AA91-0025905A6090.root
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FCAAF7EF-6CAD-E611-914A-0025905A608E.root
/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/F8FF7BE8-9DAE-E611-B438-0025905A60B8.root

Run dasgoclient -h to get some help, and dasgoclient -examples to see an extensive list of examples with this command.

An alternative way to get the location of a file is to use edmFileUtil, also reachable within any CMSSW environment:

$ edmFileUtil -d /store/data/Run2016G/JetHT/AOD/07Aug17-v1/50003/56319AEC-1A7F-E711-A64F-001E677923E6.root
dcap://dcache-cms-dcap.desy.de//pnfs/desy.de/cms/tier2/store/data/Run2016G/JetHT/AOD/07Aug17-v1/50003/56319AEC-1A7F-E711-A64F-001E677923E6.root

or

$ edmFileUtil -d /store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FED18A1B-21AD-E611-8736-0CC47A7C3572.root
dcap://dcache-cms-dcap.desy.de//pnfs/desy.de/cms/tier2/store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FED18A1B-21AD-E611-8736-0CC47A7C3572.root

To investigate the content of a file, use edmDumpEventContent:

$ edmDumpEventContent root://xrootd-cms.infn.it//store/mc/RunIISummer16DR80Premix/QCD_Pt_15to30_TuneCUETP8M1_13TeV_pythia8/AODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/70000/FED18A1B-21AD-E611-8736-0CC47A7C3572.root

(output may be quite long... just use grep to focus on what you're looking for).

Rucio request

If the datasets are not available on a Tier2 server, but exist on a Tier1 server or on tape, you may need to make a Rucio request

A few commands have to be run. First, you need to set up your Rucio account:

export RUCIO_ACCOUNT=$USER

where $USER is your CERN username. Note that DAS has already tried to set up this variable for you, assuming your username on the running machine to be identical to your CERN username. Then you can "submit rules", e.g.:

rucio add-rule --ask-approval --lifetime 2592000 cms:/SingleMuon/Run2016G-TkAlMuonIsolated-21Feb2020_UL2016-v1/ALCARECO#5ca78b2c-101d-4e20-8bcd-8b509e7eaf28 1 T2_CH_CERN --lifetime 2592000 --activity "User AutoApprove" --ask-approval --comment "Details for use, ticket reference if any"

where

  • you should adapt the dataset (/*/*/*),
  • the hash for the lumi block is not compulsory (after the #),
  • and you should adapt the destination Tier2 server. Each call of a this command will return an ID: save it somewhere. You can add several rules, each one will have its own ID. To check the status of your request:
    rucio list-rules --account $USER

Running the <em>n</em>-tupliser

The use of campaigns is not mandatory to execute any of the commands to produce the n-tuple, but is recommended to ensure reproducibility and documentation of an analysis.

In general, to avoid reinventing the wheel, we resort to CRAB commands whenever possible. Only the submission is partly re-written.

At DESY, the n-tuples will be stored on pnfs. This LFS disk may only be used to storage the n-tuples produced with CRAB. See here for more information.

CMSSW config file (<a href="https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideAboutPythonConfigFile">TWiki</a>)

The steering files in CMSSW are mostly written in Python. In particular, the steering file of the n-tupliser is python/Ntupliser_cfg.py. It contains options like jet size, triggers, datasets, etc. which may vary from time to time. Most of them are guessed automatically from the year or from the type of data (simulated vs real); other options are provided with the help of a configuration file in JSON format (see below).

To test the n-tupliser directly, you can use the standard CMSSW command:

cmsRun Ntupliser.py # on EL8
das-cmssw el8 cmsRun Ntupliser.py # on EL9

where the prefix das-cmssw el8 is expected if you are running from EL9 (this will be implicit in the following) or

cmsRun Ntupliser.py inputFiles=root://cms-xrd-global.cern.ch/[path] maxEvents=100

where [path] corresponds to a path to a dataset file (which you can obtain with dasgoclient from a generic dataset name, as explained ../README.md "here". In general, CMSSW provides a command line parser for cmsRun; in this framework, we only use it to provide a config file in JSON format (see ../test/example.json).

If the datasets are reachable locally, it is technically possible to run the n-tupliser locally too, using the local farm, but for better reproducibility and greater generality, it is better to rely on CRAB. The essential commands to run CRAB are explained ../README.md "above"; in practice, a custom command is available to submit several datasets at a time. All other actions rely on standard CRAB commands (e.g. resubmit, kill, etc.).

Documentation:

<tt>mkNtuples</tt>

The source script may be found in scripts; after the installation of the software, it is available directly as a command in the prompt. It should be run from the workarea or any large file storage (LFS) area. It takes a JSON config file as input; examples of config file may be found in test.

The script may be run both from EL8 or EL9; its behaviour is slightly different. For instance, the local mode only works from an EL9 node. Use mkNtuples -h to know more.

CRAB3

Here, we only present the essential commands. Many TWikis provide additional information:

To initialise CRAB, you may have to use the following (although in principle, DAS sources it automatically for you):

source /cvmfs/cms.cern.ch/crab3/crab.sh

This is done in the default initialisation script of the framework, in case you use it. All CRAB commands start with crab (assuming that you run from EL8); to get a list of the available commands, enter crab help.

Submit jobs

To submit, in principle, one may use crab submit config.py where a CRAB configuration in Python format must be provided; in practice, here, we rather rely on a custom command, reachable anywhere in the shell after sourcing CMSSW. It makes use of the CRAB API.

One submission per dataset (/*/*/*) is required (this is transparent when using the custom command). A new directory with local details for each submission: think twice before deleting it.

Babysitting

Check the status:

crab status [path/to/dir]

This will show you if the submission has succeeded and if it is still running. You may have to resubmit failed jobs:

crab resubmit [path/to/dir]

If the jobs keep failing:

  • extend the running time (see commands options);
  • check the logs (in the job directory);
  • or contact the Computing group.

For real data, it is absolutely essential to reach 100% of the dataset. Instead, for MC, it is acceptable if you only reach 95% or 99% (the events are distributed uniformly, and as long as the whole phase space is covered by real data is also covered by MC with decent statistics).

If you are running from EL9, add das-cmssw el8 in front of each command.

After the run

This output of the CRAB job may be needed for instance to calculate the luminosity:

crab report [path/to/dir]

Calculate luminosity

First produce the lumi files from the CRAB jobs:

crab report CRAB/[dir]

It will produce the JSON files in CRAB/[dir]/results

Then take example from the following script: ```bash files=CRAB/darwin_*_JetHT_Run2017*

jsonFiles= for f in $files do json=echo $f/results/crab*.json #echo $json jsonFiles="$jsonFiles $json" done allFiles=‘echo $jsonFiles | sed 's/\/afs/$USER@naf-cms.desy.de:\/afs/g’`

for f in files/*.json do echo $f brilcalc lumi –normtag /cvmfs/cms-bril.cern.ch/cms-lumi-pog/Normtags/normtag_PHYSICS.json -u /fb -i $f | grep -A2 totrecorded | tail -1 | awk '{print $12}' done

Output for data17 lumis [fb-1]:

files/crab_das1_JetHT_Run2017B-31Mar2018-v1.json 4.794 files/crab_das1_JetHT_Run2017C-31Mar2018-v1.json 9.617 files/crab_das1_JetHT_Run2017D-31Mar2018-v1.json 4.248 files/crab_das1_JetHT_Run2017E-31Mar2018-v1.json 9.314 files/crab_das1_JetHT_Run2017F-31Mar2018-v1.json 13.535 ```