tdaq-09-00-00

Introduction

Doxygen Javadoc

The ATLAS TDAQ software version tdaq-09-00-00 has been released on 26th March 2020.

Please note that tdaq-09-00-01 is also available, but is not a newer version of tdaq-09-00-00, but the Python 3 variant ! This is to be used for migration to Python 3 by detectors and off-line software.

Availability and Installation

Outside of Point 1 the software should be used via CVMFS. It's official location is

/cvmfs/atlas.cern.ch/repo/sw/tdaq/tdaq/tdaq-09-00-00/

At Point 1 the software is as usual available at

/sw/atlas/tdaq/tdaq-09-00-00/

The software can also be installed locally via ayum.

git clone https://gitlab.cern.ch/atlas-sit/ayum.git
source ayum/setup.sh

Modify the prefix entries in the yum repository files in ayum/etc/yum.repos.d/*.repo to point to the desired destination.

ayum install tdaq-09-00-00_x86_64-centos7-gcc8-opt

Configurations

The release is available for the following configurations:

  • x86_64-centos7-gcc8-opt
  • x86_64-centos7-gcc8-dbg
  • x86_64-centos7-gcc9-opt
  • x86_64-centos7-gcc9-dbg

External Software

LCG_97

The version of the external LCG software is LCG_97.

TDAQ Specific External Software

Package Version
cmzq 3.0.2
zyre 1.1.0
libfabric 1.6.2
nlohmann/json 2.1.1
pugixml 1.9
ipbus-software 2.7.2
microhttpd 0.9.59
mailinglogger 5.0.0
netifaces 0.10.9
paramiko[gssapi] 2.7.1

Removed Packages

The following packages have been removed since tdaq-08-03-01:

CES

Package: CES
Jira: ATDAQCCCES

Live documentation about recoveries and procedure can be found here.

Changes in recoveries and/or procedures:

  • Stopless recovery: handle failures on the detector side (ATDAQCCCES-149):

    In case no channels can be recovered, the rc::HardwareRecovered message can be sent with an empty list of recovered channels. In that case, CHIP will notify of the failure in recovery but the trigger will be released.

  • Stop-less removal and recovery adapted to the new SWROD (ATDAQCCCES-137);

  • HLT recoveries adapted to the new HLT framework (ATDAQCCCES-138).

Internal changes:

  • The ESPER CEP engine has been updated to version 8 (ATDAQCCCES-133);
  • Fixed scalability issue observed after the introduction of ESPER 8 (ATDAQCCCES-150);
  • Publishing information about the status of the Java Virtual Machine (ATDAQCCCES-144);
  • Proper handling of ERS fatal messages (ATDAQCCCES-147);
  • Fixing an issue with the BAD_HOST problem not being cleared (ATDAQCCCES-143);
  • Executing tests in case of transition timeout (ATDAQCCCES-140);
  • Adapting to changes in DAL (new implementation of template applications and segments);
  • Adapting to changes in DVS/TestManager (new implementations).

CMake Changes

The main TDAQ.cmake file is compatible with CMake 3.16.3 and also still works with CMake 3.14.3.

This version of CMake complains loudly if there is no top-level project() command in the main CMakeLists.txt file. This was always required but so far silently ignored. To make CMake happy the following changes should applied:

Work Area

For a user work area there is a small addition in the top-level CMakeLists.txt file needed. It should look like this:

cmake_minimum_required(VERSION 3.14.0)
project(work)   # the new line
find_package(TDAQ)
include(CTest)
tdaq_work_area()

Otherwise you will get a large warning with CMake 3.16, though things will still work.

Downstream Projects

Downstream projects should have a top-level CMakeLists.txt file that looks as follows:

cmake_minimum_required(VERSION 3.14.0)
project(myproject VERSION 1.0.0)
find_package(TDAQ)
set(TDAQ_DB_PROJECT myproject)
tdaq_project(myproject 1.0.0 USES tdaq 9.0.0)

Jers - Java ERS

tdaq-09-00-01

Use of TDAQ_ERS_TIMESTAMP_FORMAT with default value "yyyy-MMM-dd HH:mm:ss,SSS z" to allow defining timestamp format like in ERS, including milliseconds. The format used before was "DDD MMM dd HH:mm:ss z yyyy" e.g. Mon Mar 30 15:06:28 CEST 2020.

tdaq-09-00-00

ProcessManager

Package: ProcessManager

The ProcessManager twiki can be found here.

Changes in public APIs:

  • Added information about the availability of process out/err files:

    The PMGProcessStatusInfo structure has been extended with two additional attributes (out_is_available and err_is_available) indicating whether the concerned application has produced any out or err log.

  • Using smart pointers in client library:

    All the references to Process instances are encapsulated in shared pointers.

Internal changes:

  • Not deleting or changing permissions of err/out files if they are not regular files (this covers the case in which the err and/or out streams are sent to /dev/null).

RCUtils

Package: RCUtils

  • rc_error_generator: added the generation of ERS issues related to the stop-less removal/recovery for SWROD;
  • Added utility class to monitor several metrics of the Java Virtual Machine: the new class (JVMMonitor) is defined in the RCUtils.tools package. The RCUtils.jar jar should be added to the classpath;
  • Several improvements to the DAQ efficiency tools.

RunControl

Package: RunControl
Jira: ATDAQCCRC

This is the link to the main RunControl twiki.

Enhancements:

  • The Java library has been extended with the possibility to create Java Run Control applications (ATDAQCCRC-40). It is now possible to have Run Control applications in the three main languages used in TDAQ: C++, Java and Python.

    An example application and the related script to start it are provided.

    Additional details can be found here.

Changes in utility and/or tools:

  • rc_sender (see the online help for details about the syntax to be used):

    The STARTAPP/STOPAPP/RESTARTAP/TESTAPP/ENABLE/DISABLE commands support regular expressions;

    The same command can be sent to multiple applications using a regular expression for the application name.

Public changes in APIs:

  • Regular expressions enabled for the STARTAPP/STOPAPP/RESTARTAP/TESTAPP/ENABLE/DISABLE commands:

    In the concerned command objects, application or component names can be replaced by regular expressions. In order to be treated as regular expressions, the names must end with the //r characters.

  • The Resynch command has been extended to include the extended L1ID (and not only the ECR counter);

  • Added initAction to user routines (ATDAQCCRC-191).

Internal changes:

  • Adding the SWROD to the computation of the detector mask (ATDAQCCRC-46);
  • Adapting to changes in the PMG library (smart pointers for daq::pmg::Process instances);
  • After a reload, tests for apps out of membership and not up are reset;
  • Adapting to changes in DAL (new implementation of template applications and segments);
  • Adapting to changes in DVS/TestManager (new implementations);
  • Boost shared mutex replaced with std version.

SFOng

  • replaced deprecated tbb::tbb_hasher with std::hash
  • replaced deprecated tbb::atomic with std::atomic
  • fixed: starting a new run without going to shutdown (stop/start) could produce Late events at the beginning of the new run for a transition period

Test Manager

tdaq-09-00-00

A complete re-implementation of the former Test Manager package used in Run 2, addressing scalability (threading), performance :muscle: and code maintenance issues. New design was presented here: https://indico.cern.ch/event/780731/contributions/3250021/attachments/1769325/2880324/DVS_TM_LS2.pdf

:point_up: Schema for describing Tests (Test4Class, Test4Object) did not change, in the schema for TestPolicies there are few changes in TestFailureAction configuration as described here: https://indico.cern.ch/event/800655/contributions/3327341/attachments/1801247/2938116/TM_CC_25Feb2019.pdf

Each Failure may have (generic) TestFailureAction associated, and the action and action parameters are the attributes of the TestFailureAction class. There are no more specialized classes like RebootAction, all types of actions (e.g. 'reboot', 'execute', 'test') may be specified as string (enumeration) in TestFailureAction::action attribute, i.e. it is possible to add new actions when needed. The parameters of an action is configured as Json { param1: value1} string in TestFailureAction::parameters attribute (e.g. for 'reboot' action), or taken from associated Executable object (e.g. for 'exec' action). After substitution of parameters (e.g. #this.RunsOn.UID), the action and a json string with parameters are passed as part of TestResult to the client (DVS, RC or CHIP) who is respobsible for execution of the action. An example of an action ('test'):

<obj class="TestFailureAction" id="TestExtraComputer">
 <attr name="action" type="enum">"test"</attr>
 <attr name="parameters" type="string">"{ Component: #this.RunsOn.UID; Scope: diagnosis }"</attr>
   ...
</obj>

Performance of the new implementation was studied in this presentation: https://indico.cern.ch/event/817692/contributions/3413542/attachments/1838805/3013693/TMperfCC060519.pdf

TriggerCommander

Package: TriggerCommander

Clients

A backward compatible change (for clients) has been introduced in the TriggerCommander/MasterTrigger.h interface. The hold() method has been extended with a boolean flag extended_lvl1id which defaults to false.

The method will return the last ECR as it did in the past. When the new flag is set to true it will instead return the full Level 1 ID (from which the ECR can be extracted as the uppermost 8 bits).

Implementations of the MasterTrigger interface

An implementation of the MasterTrigger interface has to modify its overloaded method:

class X : public MasterTrigger {
  ...

  uint32_t hold(const std::string& dm, bool extended_lvl1id = false) override;
  ...
};

config

tdaq-09-00-00

Available since tdaq-08-03-01

Remove obsolete get_class_info() method using MetaDataType enum

Jira: ADTCC-184

Use well structured method instead:

const daq::config::class_t& get_class_info(const std::string& class_name, bool direct_only = false);

dal

Changes since last release:

SW repository generation utility was removed

Remove dal_create_sw_repository. The create_repo.py has to be used instead. See CMake TWiki for more information.

tdaq-09-00-00

Available since tdaq-08-03-00

New template-based segments and applications config

Jira: ADTCC-177

The generated DAL classes BaseApplication and Segment replaced old AppConfig and SegConfig ones.

Updated partition algorithms:

std::vector<const BaseApplication *> Partition::get_all_applications(std::set<std::string> * app_types = nullptr,
  std::set<std::string> * use_segments = nullptr, std::set<const Computer *> * use_hosts = nullptr) const;

const Segment * Partition::get_segment(const std::string& name) const;

The segment algorithms:

std::vector<const BaseApplication *> Segment::get_all_applications(std::set<std::string> * app_types = nullptr,
  std::set<std::string> * use_segments = nullptr, std::set<const Computer *> * use_hosts = nullptr) const;

const BaseApplication * Segment::get_controller() const;

const std::vector<const BaseApplication *>& Segment::get_infrastructure() const;

const std::vector<const BaseApplication *>& Segment::get_applications() const;

const std::vector<const Segment*>& Segment::get_nested_segments() const;

const std::vector<const Computer*>& Segment::get_hosts() const;

const Segment * Segment::get_base_segment() const;

bool Segment::is_disabled() const;

bool Segment::is_templated() const;

void Segment::get_timeouts(int & actionTimeout, int & shortActionTimeout) const;

The application algorithms:

const Computer * BaseApplication::get_host() const;

const daq::core::Segment * BaseApplication::get_segment() const;

std::vector<const daq::core::Computer *> BaseApplication::get_backup_hosts() const;

const daq::core::BaseApplication * BaseApplication::get_base_app() const;

std::vector<const daq::core::BaseApplication *>
BaseApplication::get_initialization_depends_from(const std::vector<const daq::core::BaseApplication *>& all_apps) const;

std::vector<const daq::core::BaseApplication *>
BaseApplication::get_shutdown_depends_from(const std::vector<const daq::core::BaseApplication *>& all_apps) const;

The algorithms on Segment and BaseApplication objects may only be called, if such objects in turn were instantiated by get_all_applications() or partition's get_segment() DAL algorithms.

Algorithms changes

Jira: ADTCC-209

Several algorithms were never used and deleted or reorganized

Modified algorithms

  • BaseApplication::get_output_error_directory() is renamed to Partition::get_log_directory() because the log directory does not depend on the application
  • BaseApplication::get_info() remove partition, segment and host parameters
  • Segment::get_timeouts() remove partition and database parameters
  • SubstituteVariables remove database parameter in the constructor

Removed algorithms

Several algorithms were obsolete and deleted, or not used by user code and removed from public API:

  • ComputerProgram::get_parameters()
  • BaseApplication::get_application()
  • BaseApplication::get_some_info()
  • ResourceBase::get_applications()
  • Variable::get_value(tag)

dbe

Package: dbe
Jira: ATLASDBE

  • Improved filtering and searching ATLASDBE-144:

    The filter (or the auto-completion) in drop-down menus now works on tokens (i.e., if Computers have names like pc-tdq-onl-* or pc-tdq-tpu-*, in order to get only tpu-like nodes, typing tpu is enough).

  • Fixing problem building table after the move to Qt5 ATLASDBE-241;

  • Improved sorting for table ATLASDBE-246;
  • Several improvements to the schema editor ATLASDBE-239.

dcm

tdaq-09-00-00

New features include: dfinterfaceDCM: added an operationTimeOut issue when the tryGetNextUntil function is not able to retrieve an event. This is required to fix the problem of Jira ADHI-4728. Processor.cxx: increased the IDLE timeout. Required for testing for the Jira ADHI-4770.

DVS (Diagnostics and Verification Framework)

tdaq-09-00-00

Replacement of CLIPS

Full re-implementation of the expert-system engine: CLIPS (an old forward-chainging C rule engine) was replaced by a custom forward-chaining engine where rules can be defined as C++ objects (relying on C++11 features like lambda). The rules are compiled and linked to the executable. The present set of rules is to handle Test policies and dependencies: rules.

TestFailure follow-up actions

A change in TestRepository schema related to description of TestFailureActions is described in TM release notes and in main DVS/TM twiki. An Action is prepared by DVS from this configuration and added to the test result passed to client (RC, CHIP).

limitations and future developments

Make possible to load rules from user-supplied .so files at runtime, just passing file names as parameter to DVS engine.

DVS Tests

tdaq-09-00-00

Added generate_core_file.sh script which can be used as a TestFailure follow-up action to automatically generate a core file for a misbehaving application process (to allow debuging rare conditions with applications).

Usage: generate_core_file.sh [-v] [-h] [-d <directory_name>] -p <partition_name> -a <app_name> [-m mail_address]"

Parameters for particular application are deduced from the configuration in runtime, see DVS/TestManager documentation for more details on configuring this in the test repository: https://twiki.cern.ch/twiki/bin/view/Atlas/DaqHltTestManager#How_to_Parametrize_Tests

dynlibs - Load shared libraries

This package is deprecated from tdaq-09-00-00 onwards.

Please use plain dlopen() or boost::dll instead. Note that unlike in this package, the boost::dll::shared_library object has to stay in scope as long as the shared library is used !

Example of possible replacement

#include <boost/dll.hpp>

double example()
{
   boost::dll::shared_library lib("libmyplugin.so",
                                  boost::dll::load_mode::type::search_system_folders |
                                  boost::dll::load_mode::type::rtld_now);
   // Get pointer to function with given signature
   auto f = lib.get<double(double, double)>("my_function")
   return f(10.2, 3.141);
}

HLTSV - HLT Supervisor

tdaq-09-00-00

Changing HLT Prescales in a Pre-loaded Partition

The HLTSV now supports pre-scale changes in pre-loaded data mode. This should make it easier to test this functionality without the CTP. This is for HLT experts only.

The functionality is enabled when the TriggerConfiguration.TriggerCoolConnection attribute (a string) is not empty. This should be either a COOL alias (for the typical use at Point 1 with the CTP), or a test database, e.g. sqlite. Never use this with a production database for a test !

  1. Initialize the sqlite database
  % TestHltPsk2CoolWriting  hltPrescaleCool.db
  3
  q
  1. Set the TriggerCoolConnection to a string like the following (adapting the path):
    <attr name="TriggerCoolConnection" type="string" val="sqlite://;schema=/scratch/work/rhauser/cmtest/cool/hltPrescaleCool.db;dbname=CONDBR2"/>
  1. Start your partition

% trg_command_trigger -p -n HLTSV -c SETHLTPRESCALES --arguments 200

  1. Check the log output of the HLTSV, there should be some informal lines, otherwise check the error logfile.
   % TestHltPsk2CoolWriting  hltPrescaleCool.db
   1
   0
   q

You should now see the new entries that you set via the trg_command_trigger utility.

MTS

tdaq-09-00-00

Since tdaq-07-01-00 release, only few bug fixes and 2 little additions: - allow specifying ERS context line number in subscription syntax, e.g.

(app=HLTMPPU* and context(line)!=666) 
  • a Java ulility (available in erssender.jar) to send arbitrary ERS messages like
TDAQ_PARTITION=mypart TDAQ_APPLICATION_NAME=myapp java -cp erssender.jar:ers.jar utils.ERSSender ERROR mymsg-id "My message text"

OKS

tdaq-09-00-00

OKS data file format

Jira: ADTCC-185

New OKS format since tdaq-08-03-01

Store values inside tags

Store attribute and relationship data inside values of tags

Format for single value attributes

<attr name="xxx" type="yyy" val="zzz"/>

Format for multi value attributes

<attr name="xxx" type="yyy">
 <data val="zzz1"/>
 <data val="zzz2"/>
</attr>

Format for 0..1 and 1..1 relationships

<rel name="xxx" class="yyy" id="zzz"/>

Format for 0..N and 1..N relationships

<rel name="xxx">
  <ref class="yyy1" id="zzz1">
  <ref class="yyy2" id="zzz2">
</rel>

Skip empty data

Do not store attributes with values equal to empty initial and empty relationships

Compatibility and data conversion

The changes are backward compatible. New OKS library is able to read old "extended" and "compact" data formats. For conversion open data file stored in old format using OKS Data Editor or DBE and save it. Do not mixture old and new formats in the same file, when update file in a text editor.

PartitionMaker no longer uses FarmTools

Package: Partition Maker

When the partition maker script (e.g. pm_part_hlt.py) generates a partition with hosts that are not yet in the database, it will try to execute various commands on the host. In the past it used the FarmTools package thas has been removed from this release.

From this release on it will instead use the paramiko Python package to execute the commands. It will first try a connection with ssh keys, then with GSSAPI (i.e. Kerberos). There is no longer any parallelism. This may slow down the generation of hosts file for large farms.

swrod

  • By default ROBs and E-Links IDs in a SW ROD OKS configuration are shown in hexidecimal format.

  • Common SW ROD configuration objects, which don't normally require customization have been placed to the daq/sw/swrod-common.data.xml OKS configuration class that is installed to the installed/share/data area of the TDAQ release. Any SW ROD OKS configuration is advised to use these objects instead of creating custom ones unless any parameters modification is required.

  • GBTModeBuilder and FullModeBuilder algorithms now automatically detect if the respective SW ROD application uses TTC data handler for getting L1 Accept packets from FELIX. If that is the case the algorithms will run in TTC-aware mode, otherwise they will be data-driven. Contrary to the previous release there is no need to change the value of the Type parameter of the ROBFragmentBuilder class in the OKS configuration to change the fragment building mode. In the new implementation one should either link an instance of the SwRodL1AInputHandler with the SwRodConfiguration object via the L1AHandler relationship to use TTC-aware variant of the chosen algorithm or otherwise to leave this relationship empty to use the fragment building algorithm in data-driven mode.

  • Each detector custom plugin may now provide a function for data integrity validation following the example given below. If a plugin provides such a function the function name shall be set to the DataIntegrityChecker attribute of the SwRodCustomProcessingLib OKS configuration object that describes this plugin.

    cpp extern "C" std::optional<bool> dataIntegrityChecker(const uint8_t * data) { if (contains_checksum(data)) { uint8_t checksum = get_checksum(data); return checksum == calculate_checksum(data) ? true : false; } return std::nullopt; }

  • The package provides so called Felix Emulator that can be used to send input data to an arbitrary SW ROD Application via Netio protocol. OKS file data/FelixEmulatorSegment.data.xml shows how to configure Felix Emulator. This example provides Felix Emulator for the SW ROD Application defined in the data/SwRodSegment.data.xml file. The idea is that the Emulator uses internal data generators which create L1A and data packets and the Emulator sends these packets to the SW ROD Application. The Emulator is mplemented by the Sw ROD framework using a special plugin that can publish generated data via Netio protocol. The generated data packets can be customized either by modifying an existing implementation of the DataInput interface provided by the test/core/InternalDataGenerator.h(cpp) files or by providing another custom plugin that declares and implements a new class inheriting the swrod::test::InternalDataGenerator and overriding generatePacket() virtual function.

Cross Compilation

This release has all changes in the necessary CMake files to support some basic cross-compilation. This has only been tested with the aarch64 architecture (ARM) and using the same CentOS 7 version on the target as on the host.

Only a small subset of the LCG software is available for ARM. A reasonable subset of tdaq-common and tdaq can be cross-compiled so that a run-controlled application can be developed for the ARM target sytem.

For more details look at the basic cross-compilation setup and the TDAQ cross-compilation project.