DAS  3.0
Das Analysis System
Das Analysis System

Das Analysis System is a general project involving modular tools for physics analysis with high-level objects such as jets, leptons, and photons. The principle is to perform physics directly from the shell in atomic steps, applying one correction at a time and treating all systematic uncertainties simultaneously.

The GitLab group is divided in several repositories:

  • The Installer repo contains the necessary scripts to install the suite pipeline status.
  • The Core repo contains most of the code for the analysis of CMS data pipeline status.
  • The Tables repo contains the calibration of the high-level objects pipeline status.
  • The Darwin repo is a mirror of a general toolkit for physics analysis pipeline status.
  • PlottingHelper is a useful library originally made by Radek Žlebčík (Charles University) to help make plots with ROOT.

Note: in the Core repo, the Ntupliser module require a working container that can only be installed if /cvmfs is available. See the dedicated section on containers for more details.

The Campaigns subgroup contains actual repos corresponding to different analyses with configs, plotting macros, and possibly CRAB outputs, or anything relevant to the reproducibility of an analysis. A template repo is provided with suggestions and guidelines.

Installation

In a LCG environment

In the following, we provide two alternative methods to install the full framework, including the n-tupliser, on machines with access to /cvmfs.

Method #1: From scratch with the default installer (recommended)

In general, it is recommended that you install the software on a fast disk to ensure fast compilation (e.g. AFS), but that you process the heavy n-tuples on a dedicated area (e.g. NFS at DESY, EOS at CERN). Keeping the software neat and clean is important for reproducibility.

  1. Clone the Installer in a directory (we propose DasAnalysisSystem). Then source the minimal environment for CMSSW, CRAB, and RUCIO from that directory:
    git clone https://gitlab.cern.ch/cms-analysis/general/DasAnalysisSystem/gitlab-profile.git DasAnalysisSystem
    cd DasAnalysisSystem
  2. Run the installation:
    source /cvmfs/sft.cern.ch/lcg/views/LCG_104/x86_64-el9-gcc13-opt/setup.sh
    ./install.sh
  3. After a few minutes, you get back to the prompt. You only have to set up the newly compiled environment:
    source tools/setup.sh

Method #2: By hand, step by step (for experts)

The instructions below assume that you have an environment with all the required dependencies or that you know how to install them if CMake complains that they are missing. Recent LCG environments do not require additional setup. They build the software without any special flags, so the compiler will use no optimization and include basic debugging information. You can pass additional settings using the CMAKE_BUILD_TYPE variable.

We will install the packages in order, starting with Darwin then following with Core. First, we create two folders, one that will contain the build artifacts and one that will contain the installed software. For convenience, we store their location in variables:

# readlink makes the paths absolute
BUILD_DIR=$(readlink -f ./build)
INSTALL_DIR=$(readlink -f ./install)
mkdir -p $BUILD_DIR $INSTALL_DIR

The first step is then to download, compile, and install Darwin:

git clone https://gitlab.cern.ch/Proto/Darwin.git # download
cmake -B $BUILD_DIR/Darwin -S Darwin -DCMAKE_INSTALL_PREFIX=$INSTALL_DIR # configure
cmake --build $BUILD_DIR/Darwin --target install -j$(nproc) # build and install

We then do the same for Core:

git clone https://gitlab.cern.ch/cms-analysis/general/DasAnalysisSystem/Core.git # download
cmake -B $BUILD_DIR/Core -S Core -DCMAKE_INSTALL_PREFIX=$INSTALL_DIR # configure
cmake --build $BUILD_DIR/Core --target install -j$(nproc) # build and install

Finally, we clone the respository containing scale factors and other corrections:

git clone --recursive https://gitlab.cern.ch/cms-analysis/general/DasAnalysisSystem/tables.git

This gives us an installation of both Darwin and Core in the install folder. You need to set a few environment variables before you can use them:

Variable Value
PATH $BUILD_DIR/bin:$PATH
LD_LIBRARY_PATH $BUILD_DIR/lib64:$LD_LIBRARY_PATH (note: can be different on some systems)
PYTHONPATH $BUILD_DIR/python:$PYTHONPATH
DARWIN_FIRE_AND_FORGET $BUILD_DIR/lib64 (note: can be different on some systems)

In addition, we recommend setting variables for the location of the various repositories:

Variable Value
DAS_BASE $PWD
DARWIN_BASE $PWD/Darwin
CORE_BASE $PWD/Core
DARWIN_TABLES $PWD/tables

With <tt>micromamba</tt>

Follow this method to install the software on any other machine (e.g. your private laptop). In this case, you will not be able to run the n-tupliser.

  1. If micromamba is not yet available on your machine, you should install it:
    "${SHELL}" <(curl -L micro.mamba.pm/install.sh)

Troubleshooting: if you get an error message related to HTTP2, you can switch back to HTTP1.1 as follows:

"${SHELL}" <(curl="curl --http1.1" curl -L micro.mamba.pm/install.sh)
  1. Then create a new environment with all prerequisites:
    micromamba create -f prerequisites.yml -y
    micromamba activate DAS

where DAS is here just a name, which you can adapt to anything. Troubleshooting: on certain operating systems (e.g. MacOS), gcc seems not to be available. In that case, you can change to clang by hand in prerequisites.yml.

  1. Run the installation:
    ./install.sh

Troubleshooting: on certain operating systems (e.g. MacOS), you may need to give -DCMAKE_OSX_ARCHITECTURES=arm64 as option to the script.

  1. After a few minutes, you get back to the prompt. You only have to set up the newly compiled environment:
    source tools/setup.sh

Loading the environment in a new session

In both cases, first go to the root directory of DasAnalysisSystem, and run the following:

source tools/setup.sh

In a LCG environment, you may also want to run CRAB jobs or RUCIO requests:

  • To run CRAB jobs, you will also need to set up a valid grid certificate. If you already have done so, this will enable it:
    voms-proxy-init --rfc --voms cms -valid 192:00
  • To make RUCIO requests, tools/setup.sh tries to guess your RUCIO username from your local username (unless was already set up). This is not guaranteed to work and you may have to define your RUCIO username ahead from sourcing the DAS environment.

Working with containers

CMSSW is necessary to produce n-tuples containing CMS data. If /cvmfs and apptainer are available on your system, the installer should have installed a version of CMSSW on your system. Several commands are provided to make use of the CMSSW environment.

The CMSSW environment is based on no longer maintained versions of Linux, such as Enterprise Linux 8, whereas most clusters use a more recent operating system, Alma Linux 9. A compatibility layer is thus needed to run CMS software, which is provided by means of "container images". For instance, CMSSW 10 requires CentOS7 (used for UL production), whereas CMSSW 12 requires EL8 (used to compile the n-tupliser). DAS provides commands to start containers, called cc7 and el8, which take no arguments. After running el8, you start a shell in the image, then you can source the CMSSW environment as follows:

cd $DAS_BASE/CMSSW_12_4_0
cmsenv

then you should be able to use CMSSW as usual. A similar command, cc7, is also available, in case you would need to install an earlier version of CMSSW.

To run single commands, e.g. scram b, dasgoclient, cmsRun, the prefix command das-cmssw is also provided: like all prefix commands, it is added to the beginning of the command that you want to execute (e.g. das-cmssw el8 scram b runtests). The Darwin and DAS commands are not available in the image; only commands provided by a vanilla CMSSW and by the Core/Ntupliser module are available.

The source code of all these commands may be found in $DAS_BASE/CMS/scripts (note that cc7 is only a symlink of el8).

Note: containers require a certain amount of memory (roughly 2GB). Make sure that you machine has the necessary resources, otherwise certain commands will take forever (or just never end).