DAS
3.0
Das Analysis System
|
Das Analysis System is a general project involving modular tools for physics analysis with high-level objects such as jets, leptons, and photons. The principle is to perform physics directly from the shell in atomic steps, applying one correction at a time and treating all systematic uncertainties simultaneously.
The GitLab group is divided in several repositories:
Installer
repo contains the necessary scripts to install the suite .Core
repo contains most of the code for the analysis of CMS data .Tables
repo contains the calibration of the high-level objects .Darwin
repo is a mirror of a general toolkit for physics analysis .PlottingHelper
is a useful library originally made by Radek Žlebčík (Charles University) to help make plots with ROOT.Note: in the Core
repo, the Ntupliser
module require a working container that can only be installed if /cvmfs
is available. See the dedicated section on containers for more details.
The Campaigns
subgroup contains actual repos corresponding to different analyses with configs, plotting macros, and possibly CRAB outputs, or anything relevant to the reproducibility of an analysis. A template repo is provided with suggestions and guidelines.
In the following, we provide two alternative methods to install the full framework, including the n-tupliser, on machines with access to /cvmfs
.
In general, it is recommended that you install the software on a fast disk to ensure fast compilation (e.g. AFS), but that you process the heavy n-tuples on a dedicated area (e.g. NFS at DESY, EOS at CERN). Keeping the software neat and clean is important for reproducibility.
Installer
in a directory (we propose DasAnalysisSystem
). Then source the minimal environment for CMSSW, CRAB, and RUCIO from that directory: The instructions below assume that you have an environment with all the required dependencies or that you know how to install them if CMake complains that they are missing. Recent LCG environments do not require additional setup. They build the software without any special flags, so the compiler will use no optimization and include basic debugging information. You can pass additional settings using the CMAKE_BUILD_TYPE
variable.
We will install the packages in order, starting with Darwin then following with Core. First, we create two folders, one that will contain the build artifacts and one that will contain the installed software. For convenience, we store their location in variables:
The first step is then to download, compile, and install Darwin:
We then do the same for Core:
Finally, we clone the respository containing scale factors and other corrections:
This gives us an installation of both Darwin and Core in the install folder. You need to set a few environment variables before you can use them:
Variable | Value |
---|---|
PATH | $BUILD_DIR/bin:$PATH |
LD_LIBRARY_PATH | $BUILD_DIR/lib64:$LD_LIBRARY_PATH (note: can be different on some systems) |
PYTHONPATH | $BUILD_DIR/python:$PYTHONPATH |
DARWIN_FIRE_AND_FORGET | $BUILD_DIR/lib64 (note: can be different on some systems) |
In addition, we recommend setting variables for the location of the various repositories:
Variable | Value |
---|---|
DAS_BASE | $PWD |
DARWIN_BASE | $PWD/Darwin |
CORE_BASE | $PWD/Core |
DARWIN_TABLES | $PWD/tables |
Follow this method to install the software on any other machine (e.g. your private laptop). In this case, you will not be able to run the n-tupliser.
micromamba
is not yet available on your machine, you should install it: Troubleshooting: if you get an error message related to HTTP2, you can switch back to HTTP1.1 as follows:
where DAS
is here just a name, which you can adapt to anything. Troubleshooting: on certain operating systems (e.g. MacOS), gcc
seems not to be available. In that case, you can change to clang
by hand in prerequisites.yml
.
Troubleshooting: on certain operating systems (e.g. MacOS), you may need to give -DCMAKE_OSX_ARCHITECTURES=arm64
as option to the script.
In both cases, first go to the root directory of DasAnalysisSystem
, and run the following:
In a LCG environment, you may also want to run CRAB jobs or RUCIO requests:
tools/setup.sh
tries to guess your RUCIO username from your local username (unless was already set up). This is not guaranteed to work and you may have to define your RUCIO username ahead from sourcing the DAS environment.CMSSW is necessary to produce n-tuples containing CMS data. If /cvmfs
and apptainer
are available on your system, the installer should have installed a version of CMSSW on your system. Several commands are provided to make use of the CMSSW environment.
The CMSSW environment is based on no longer maintained versions of Linux, such as Enterprise Linux 8, whereas most clusters use a more recent operating system, Alma Linux 9. A compatibility layer is thus needed to run CMS software, which is provided by means of "container images". For instance, CMSSW 10 requires CentOS7 (used for UL production), whereas CMSSW 12 requires EL8 (used to compile the n-tupliser). DAS provides commands to start containers, called cc7
and el8
, which take no arguments. After running el8
, you start a shell in the image, then you can source the CMSSW environment as follows:
then you should be able to use CMSSW as usual. A similar command, cc7
, is also available, in case you would need to install an earlier version of CMSSW.
To run single commands, e.g. scram b
, dasgoclient
, cmsRun
, the prefix command das-cmssw
is also provided: like all prefix commands, it is added to the beginning of the command that you want to execute (e.g. das-cmssw el8 scram b runtests
). The Darwin and DAS commands are not available in the image; only commands provided by a vanilla CMSSW and by the Core/Ntupliser
module are available.
The source code of all these commands may be found in $DAS_BASE/CMS/scripts
(note that cc7
is only a symlink of el8
).
Note: containers require a certain amount of memory (roughly 2GB). Make sure that you machine has the necessary resources, otherwise certain commands will take forever (or just never end).