DAS  3.0
Das Analysis System
Contributing to Darwin

Outline

  • Document your code à la Doxygen.
  • Try to stick at the already included libraries, namely STL, ROOT, and Boost.
  • Always use the command line parser and the meta-information.
  • Use the examples as a baseline (test/example0?.cc).
  • Always include tests of your code (test/*.cc).

Detailed guidelines

Writing a new executable

Most headers may be included via darwin.h.

The main function of every executable should exclusively contain:

  • the parsing of the options with the help of Darwin::Tools::Options;
  • the catching of exceptions with the help of boost::exceptions.

A typical main function will look like this:

namespace DT = Darwin::Tools;
namespace DE = Darwin::Exceptions;
int main (int argc, char * argv[])
{
try {
vector<fs::path> inputs; // internally using `ls` (use quotation marks!)
fs::path output; // note: the number of inputs and outputs is not fixed
auto options = DT::Options("Description.", /* TODO: settings */);
options.inputs("output" , &output, "output ROOT file")
.output("output" , &output, "output ROOT file")
.arg</* TODO: type */>("myArg", "path.to.myArg" /* in config */, "the description")
// add here any other input, output, or argument
;
const auto& config = options(argc, argv);
const auto& slice = options.slice(); // if `DT::split` is given in the settings
const int steering = options.steering();
myFunction(inputoutput, config);
}
catch (boost::exception& e) {
DE::Diagnostic(e); // this function belongs to `Darwin::Exceptions`
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}

where myFunction would typically be a function from the Darwin::Physics namespace (but defined in the same file, and by convention with the same name as the final executable) and contain the modification of the meta information, the loading of the dedicated corrections, and the event loop (if applicable):

void myFunction (const vector<fs::path>& inputs,
const fs::path& output,
const pt::ptree& config,
const int steering,
const DT::Slice = {1,0}
)
{
cout << __func__ << ' ' << slice << " start" << endl;
DT::Flow flow(steering, inputs);
auto tIn = flow.GetInputTree(slice);
auto [fOut, tOut] = flow.GetOutput(output));
DT::MetaInfo metainfo(tOut); // use this constructor only if the MetaInfo already exists
metainfo.Check(config); // this only checks possible inconsistencies between the history and the config
// TODO: retrieve flags such as `isMC` or `R`
// TODO: set branch addresses here
auto var = flow.GetBranchReadWrite("varname");
// TODO: update the metainfo with settings picked in `config` (typically new corrections and variations)
// event loop (keep only if input is a n-tuple!)
for (DT::Looper looper(tIn); looper(); ++looper) {
[[ maybe_unused ]] // trick to only activate `cout` in the loop if option `-v` has been given
static auto& cout = steering & DT::verbose ? ::cout : DT::dev_null;
// TODO: whatever changes you have to perform
if (steering & DT::fill) tOut->Fill(); // fill the tree only if `DT::fill` has been given
}
metainfo.Set<bool>("git", "complete", true); // by default, this entry is always `false`
cout << __func__ << ' ' << slice << " end" << endl;
}

DT::Flow takes care of setting up the branches in the input and output n-tuples. DT::Looper takes care of finding the right interval of event in the input n-tuple. A priori, an exception may be thrown from anywhere without explicitly using try ... catch (see dedicated section below). Whenever possible, it is recommended to follow this structure, but exceptions (for good reasons) may be allowed.

Config files

Config files may be used to provide options, typically to avoid lengthy command lines, but also to improve the reproducibility. The basic ideas of these config file are:

  • allow hierarchical options;
  • stick at standard formats, which could easily be interfaced to other tools (here we may use XML, JSON, or INFO; see also the Boost documentation);
  • define loose rules, i.e. a few keywords are used by default (e.g. flags, corrections, preseed, git), but the user is free to add new sections;
  • each executable fetches the options that it needs (it may compare its history with other options, but will only throw a warning in case of inconsistency, without failure);
  • each n-tuple should contain its own history, which may be translated into a config file that can be used to reproduce the exact same file (see also next section on the matter of reproducibility).

Reproducibility

Use Darwin::Tools::Options and Darwin::Tools::MetaInfo to respectively parse the command line and store generic information in the ROOT files. The usage of these classes is explained in their dedicated pages and not repeated here.

Whenever creating a ROOT file, use the following syntax:

unique_ptr<TFile> fOut(TFile::Open(Form("%s?reproducible=%s", output.c_str(), __func__), "RECREATE"));

where output should correspond to the path to the destination file. This approach allows the direct comparison (e.g. with diff, even if the ROOT file format is not human-readable) to compare two ROOT files directly with one another. After the event loop, don't forget to set the following flag to true:

metainfo.Set<bool>("git", "complete", true);

This safety is to intended to detect early interruptions in the event loop.

To get an element from the metainfo, use

const auto myVar = metainfo.Get</* TODO: type */>("path", "to", "options", "myOption");

To set an element in to the metainfo, use

metainfo.Set</* TODO: type */>("path", "to", "options", "myOption", myVar);

Exception handling

  • Whenever possible, use BOOST_THROW_EXCEPTION (instead of throw) to throw an exception: this provides additional information on the origin of the exception.
  • If you use a library, rely primarily on its dedicated classes to describe errors (e.g. std::filesystem::filesystem_error).
  • In particular, whenever you have to throw exceptions related to the processing of the data (e.g. example0?), use Darwin::Exception::BadInput for problematic input (e.g. bad ROOT files, bad config, etc.) and Darwin::Exception::AnomalousEvent for issues within the event loop.

Anywhere in the code, one may write, for instance:

bool everythingWorks = false;
if (!everythingWorks)
BOOST_THROW_EXCEPTION( AnomalousEvent(/* TODO */) );

then this will be caught in the main function (see above for minimal template).

Coding style

Whenever the code is pushed to the GitLab repository, the doxygen documentation is produced and uploaded to GitLab Pages. Take example on existing executables and adopt the same style, e.g.

  • if you write a library, rather comment the header files;
  • use short comments (//!< blah) to comment the arguments;
  • stick at global description and describe private methods and members as well;
  • keep using full names (i.e. including namespaces) in declarations, otherwise the doxygen parsers will not connect the prototype in the header and the definition;
  • use \return to describe the output of the function or method, \todo for suggestions of improvements, and \note for important things to know.

Run the following command to produce the documentation locally with the local changes:

cmake --build build --target doxygen

To preview the documentation, open build/doc/html/index.html in your browser.

Unit tests

Tests are essential for long-term development. They improve the robustness of the code on the long run. They force to tests not only the desired but also the non-desired behaviour. Finally, they simplify debugging, since they provide many simple examples that are tested very regularly.

Each library has its own unit test using Boost, which is run whenever compiling. The executables have one common unit test using Boost (test.cc), which tests the whole sequence of executables and the closure. Furthermore, the call of the consecutive executables is also performed directly from the shell, also testing closure and reproducibility.

Boost tests are implemented and compiled separately. They consist of a series of short pieces of code (a "test"), possibly gathered in a series (a "suite"):

#ifndef DOXYGEN_SHOULD_SKIP_THIS
#define DOXYGEN_SHOULD_SKIP_THIS
#define BOOST_TEST_MODULE testExecName
#include <boost/test/included/unit_test.hpp>
BOOST_AUTO_TEST_SUITE( a_name )
{
// TODO: BOOST_REQUIRE_NO_THROW, BOOST_REQUIRE_THROW, BOOST_TEST, etc.
}
BOOST_AUTO_TEST_CASE( another_test )
{
// TODO: BOOST_REQUIRE_NO_THROW, BOOST_REQUIRE_THROW, BOOST_TEST, etc.
}
BOOST_AUTO_TEST_SUITE_END()
#endif

(Note: there may be several test suites in a single file.) One should not define any main function in a test unit; Boost makes one automatically, including a powerful command line. A typical call resembles ./testExecName -l all.

The <strike>dos and</strike> don'ts

  • Don't overengineer the code.
  • Don't make more than one or two operations at a time.
  • Don't write codes longer than a few hundreds lines.
  • Don't multiply layers of scripts and of classes.
  • Don't mix plotting and advanced (slow) operations.

CMake build system

We use CMake to build Darwin. This involves multiple steps:

  1. Configuring the build. This is done by running cmake -B build -DCMAKE_INSTALL_PREFIX=install. At this stage, CMake finds all the needed libraries and tools and writes the information it found to a "cache" file (build/CMakeCache.txt). One can also customize details of the build such as the optimization level; refer to the CMake documentation for more information.
  2. Building the code. This can be done with cmake --build build. This compiles all the libraries and executables. You may use -j to run in parallel.
  3. Running tests. There are multiple ways, but the easiest is probably to run cmake --build build --target test.
  4. Installing the build artifacts. After checking that the tests pass, Darwin can be installed to the location determined at configure time (in the first step) by issuing cmake --build build --target install.

Note that on certain systems, the proper CMake command is cmake3 and not cmake.

Darwin::Tools::fill
@ fill
activate -f to fill the tree
Definition: Options.h:27
DYToLL_M-50_13TeV_pythia8_cff_GEN_SIM_RECOBEFMIX_DIGI_L1_DIGI2RAW_L1Reco_RECO.options
options
Definition: DYToLL_M-50_13TeV_pythia8_cff_GEN_SIM_RECOBEFMIX_DIGI_L1_DIGI2RAW_L1Reco_RECO.py:41
Darwin::Tools::Flow
User-friendly handling of input and output n-tuples.
Definition: Flow.h:78
Step::verbose
static bool verbose
Definition: Step.h:40
Darwin::Exceptions::Diagnostic
void Diagnostic(const boost::exception &e)
Definition: exceptions.h:131
Darwin::Tools::Slice
std::pair< int, int > Slice
total number of slices (>0) / current slice index (>0)
Definition: Flow.h:36
Darwin::Tools::Looper
Facility to loop over a n-tuple, including parallelisation and printing.
Definition: Looper.h:22
Darwin::Tools::MetaInfo
Generic meta-information for n-tuple (including speficities to Darwin).
Definition: MetaInfo.h:68
main
int main(int argc, char *argv[])
Definition: applyBTagSF.cc:169
Darwin::Tools::StandardInit
void StandardInit()
Definition: FileUtils.cc:29
Darwin::Exceptions
Handling of exceptions.
Definition: darwin.h:36
BOOST_AUTO_TEST_CASE
BOOST_AUTO_TEST_CASE(SimpleObjectNoSF)
Definition: testGenericSFApplier.cc:171
Ntupliser_cfg.config
config
Definition: Ntupliser_cfg.py:264
Darwin::Tools
Classes and functions related to the framework.
Definition: Dict_rdict.cxx:990
jercExample.inputs
def inputs
Definition: jercExample.py:118
DAS::Options
Darwin::Tools::Options Options(const char *, int=Darwin::Tools::none)
Constructs Darwin options with the correct commit information.
Definition: DASOptions.cc:14
Darwin::Tools::dev_null
static std::ostream dev_null(nullptr)
to redirect output stream to nowhere