tdaq-10-00-00

Doxygen Javadoc

The ATLAS TDAQ software version tdaq-10-00-00 has been released on 6th January September 2022.

Availability and Installation

Outside of Point 1 the software should be used via CVMFS. It's official location is

/cvmfs/atlas.cern.ch/repo/sw/tdaq/tdaq/tdaq-10-00-00/

At Point 1 the software is as usual available at

/sw/atlas/tdaq/tdaq-10-00-00/

The software can also be installed locally via ayum.

git clone https://gitlab.cern.ch/atlas-sit/ayum.git
source ayum/setup.sh

Modify the prefix entries in the yum repository files in ayum/etc/yum.repos.d/*.repo to point to the desired destination.

ayum install tdaq-10-00-00_x86_64-centos7-gcc11-opt

In case the LCG RPMs are not found, add this to etc/yum.repos.d/lcg.repo:

[lcg-repo-102b]
name=LCG 102b Repository
baseurl=http://lcgpackages.web.cern.ch/lcgpackages/lcg/repo/7/LCG_102b/
enabled=1
prefix=[...your prefix...]

Configurations

The release is available for the following configurations:

  • x86_64-centos7-gcc11-opt (default at Point 1)
  • x86_64-centos7-gcc11-dbg (debug version at Point 1)
  • x86_64-centos9-gcc11-opt
  • x86_64-centos9-gcc11-dbg
  • x86_64-centos7-gcc12-opt (experimental)
  • x86_64-centos7-gcc12-dbg (experimental)
  • aarch64-centos7-gcc11-opt (experimental)

The CentOS 9 variant recognizes all of CentOS Stream, Rocky Linux, Alma Linux, RedHat Enterprise Linux as equivalent. It is provided since the plan is to upgrade the OS in Point 1 to Alma Linux 9 in the next winter shutdown (2023/24). It may become available at Point 1 later this year as the new OS is tested.

Microarchitecture

For the first time the TDAQ release is compiled with the -march=x86_64-v2 flag. This enables instructions up to SSE 4.2 (but no AVX) and may be incompatible with older Intel CPUs (approximately Nehalem or older). You will see an illegal instruction exception on those machines.

Note that this option is the default on RedHat Enterprise 9.x for all software.

External Software

LCG_102b

The version of the external LCG software is LCG_102b.

CORAL and COOL - 3.3.13

These two packages are no longer part of LCG. They are included in tdaq-common and their use should be transparent for users.

The official targets to use are:

  • CORAL::CoralBase
  • CORAL::CoralServerBase
  • CORAL::CoralMonitor
  • CORAL::CoralStubs
  • CORAL::CoralCommon
  • CORAL::CoralServerProxy
  • CORAL::CoralKernel
  • CORAL::CoralSockets

TDAQ Specific External Software

Package Version Requested by
cmzq 4.2.1 FELIX
zyre 2.0.1 FELIX
libfabric 1.11.0 FELIX
colorama 0.4.4 DCS
opcua 0.98.13 DCS
parallel-ssh 2.10.0 TDAQ (PartitionMaker)
pugixml 1.9 L1Calo, L1CTP
ipbus-software 2.8.4 L1Calo, L1CTP
microhttpd 0.9.73 TDAQ (pbeast)
mailinglogger 5.1.0 TDAQ (SFO)
netifaces 0.11.0 TDAQ
Twisted 22.4.0 TDAQ (webis_server)
urwid 2.1.2 TDAQ
jwt-cpp v0.6.0 TDAQ
Flask 2.1.3 TDAQ

Docker and Apptainer Images

The docker images used for building and testing the TDAQ software are available here:

docker pull gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos7
docker pull gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos9
docker pull gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:aarch64-centos7

Run it like this:

docker run -it --rm -v /cvmfs:/cvmfs:ro,shared gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos7

The corresponding apptainer/singularity images are here:

/cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos7
/cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos9
/cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:aarch64-centos7

Run it like this:

apptainer shell -B /cvmfs /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos7

Setup Options

In the following we assume some alias like

alias cm_setup='source /cvmfs/atlas.cern.ch/repo/sw/tdaq/tools/cmake_tdaq/bin/cm_setup.sh'
  • The cm_setup --list option will show the available releases including nightlies
  • The cm_setup --clean ... option will bypass all testbed specific setup. This is useful if you want to use testbed hardware but be completely independent from the existing infrastructure. You have to set your own TDAQ_IPC_INIT_REF path to start a private initial partition, if needed.
  • The cm_setup script takes a short version of the CMTCONFIG build configuration as argument. E.g.
    • cm_setup nightly dbg will setup x86_64-centos7-gcc11-dbg
    • cm_setup nightly gcc12 will setup x86_64-centos7-gcc12-opt
    • cm_setup nightly gcc12-dbg will setup x86_64-centos7-gcc12-dbg
    • There is no short cut for setting the architecture or the OS.

BeamSpotUtils

  • Updates to support new HLT histogram naming convention for Run3.
  • Add support for new track-based method.
  • Improvements and refactoring to support easier testing.

CES

Package: CES
Jira: ATDAQCCCES

Live documentation about recoveries and procedure can be found here.

Changes in recoveries and/or procedures:

  • When the TDAQ_CES_FORCE_REMOVAL_AUTOMATIC environment variable is set to true, then no acknowledgment is asked by the operator for a stop-less removal action to be executed (regardless of the machine or beam mode);
  • Stop-less removal involving the SwRod: the reporting application now always receives the initial list of components back (and not only the valid components, as it was before);
  • After each clock switch, a new command is sent to AFP and ZDC;
  • The BeamSpotArchiver_PerBunchLiveMon application it not started at warm stop anymore;
  • The DCM is now notified when a SwROD dies or is restarted (in the same way as it happens with a ROS or a SFO).

Internal changes:

  • Following changes in the MasterTrigger interface;
  • Fixed bug not allowing the auto-pilot to be disabled in some conditions;
  • Fixed bug causing the ERS HoldingTriggerAction message to not be sent.

Igui

The Igui twiki can be found here.

tdaq-10-00-00

A new revised version of the Main Command panel is introduced and selected by default. A description of the new panel can be found here.

The old legacy version is still available and will be removed in later Igui versions.

MonInfoGatherer

Alternative implementation for merging non-histogram data (ADHI-4842). This should resolve most of the timing issues we have seen with DCM data in the past. It is enabled by default but can be disabled with ISDynAnyNG configuration parameter (see ConfigParameters.md).

ProcessManager

The ProcessManager twiki can be found here.

RCUtils

Package: RCUtils

RunControl

Package: RunControl
Jira: ATDAQCCRC

This is the link to the main RunControl twiki.

SFOng

  • added: periodic update of free buffer counter (IS) even when no data are received was: "0" free buffers was published when no data received giving the wrong impression that SFOng was about to assert backpressure.

TDAQExtJars

Package: TDAQExtJars

TriggerCommander

Package: TriggerCommander

Clients

A breaking change (for clients) has been introduced in the TriggerCommander/MasterTrigger.h interface. The hold() method has a new signature:

HoldTriggerInfo hold(const std::string& dm = "");

The extended_lvl1id boolean flag has been removed, and the method now returns a data structure holding the current ECR value and the last valid extended L1ID.

Implementations of the MasterTrigger interface

An implementation of the MasterTrigger interface has to modify its overloaded method:

class X : public MasterTrigger {
  ...

  HoldTriggerInfo hold(const std::string& dm) override;
  ...
};

The HoldTriggerInfo data structure holds information about the current ECR value and the last valid extended L1ID.

beauty

nightly

Remove automated setting of pBeast server

The initial Beauty implementation provided code automatically setting the pBeast server based on the values of environmental variables. This logic proved to be weak and not easily maintainable: * especially in testbed, the pBeast server name has changed many times * overloading the pBeast environmental variable for the authentication method for setting the server name is strictly speaking incorrect.

The new implementation removes completely this detection code. It is instead responsibility of the user to always provide a server name to the Beauty constructor.

Old code relying on the previous implicit mechanism will fail:

>>> import beauty
>>> beauty.Beauty()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'server'

Control pBeast authentication method

Different pBeast servers require different authentication methods. Currently two methods are support: * no authentication required * authentication via an existing Kerberos token

The authentication method to be used can be controlled via a dedicated environmental variable (PBEAST_SERVER_SSO_SETUP_TYPE) or via the library API. The latter method is now exposed in the Beauty interface. The Beauty.__init__ method accepts a keyword argument cookie_setup. Valid values are: * None → the default behaviour of the pBeast library will be used. The environment variable will be respected, if set * Beauty.NOCOOKIE → no authentication required * Beauty.AUTOUPDATEKERBEROS → authentication via Kerberos token

>>> import beauty
>>> b = beauty.Beauty('http://someserver', cookie_setup=beauty.Beauty.NOCOOKIE)

coca

tdaq-10-00-00

  • Tag coca-03-15-05
  • Add support for shema version 4 (ADHI-4852)
  • Tag coca-03-15-03 (should make as a patch into tdaq-09-04-00)
  • Fix for Python wrapper RemoteFile.size method to return integer
  • Tag coca-03-15-02 (should make as a patch into tdaq-09-04-00)
  • Fix handling of NULL FILE_SIZE in DBStat.

coldpie

  • add few missing methods to Folder and FolderSet wrappers
  • Folder and FolderSet now have Python base class HvsNode which is a wrapper for cool::IHvsNode

config

tdaq-10-00-00

  • many performance improvements, see ADTCC-284 epic for details
  • methods using explicit prefetching parameters can always be effective even when an object is preloaded into implementation cache; over-prefetching is protected by the configuration implementation
  • add a parameter to print nested aggregated object
  • add method time() returning timestamp of an object modification

dal

tdaq-10-00-00

  • implement ResourceBase.DependsOn algorithm for auto-disabling, see ATDSUPPORT-399
  • add "/bin" to PATH process environment variable generated by DAL algorithms
  • remove singleton protecting get_partition() algorithm from over-prefetching

DAQ Tokens

The various Python based token_meister servers have been replaced by a single C++ version.

The standard deployment for a local Unix domain socket:

token_meister /path/to/private.key [ /path/to/socket ]
token_meister --local /path/to/private.key [ /path/to/socket ]

The server using GSSAPI deployment requires a Kerberos keytab for the atdaqjwt service.

cern-get-keytab --service atdaqjwt -o token.keytab
export KRB5_KTNAME=FILE:$(pwd)/token.keytab
token_meister --gssapi /path/to/private.key [ port ]

For interactively creating a token (for testing and impersonating a user):

token_meister --make /path/to/private.key user

For internal timing test:

token_meister --time /path/to/private.key user

Additional options for all versions:

--hash=<HASHNAME>

where HASHNAME is a valid OpenSSL name for a hash function (e.g. 'SHA256'). To be backward compatible with tdaq-09-04-00 use --hash=md5.

The binary is statically linked against the stdc++ library and depends otherwise only on system libraries. It can also be compiled independent from the TDAQ software (see the standalone directory). This means the binary can be just copied to a server and run if all library dependencies are met.

The old Python based servers are still available. They do not depend on any other TDAQ software which can be useful in a system deployment.

They now live in their own token_meister package and can be used like this:

python3 -m token_meister.local /path/to/key [ /path/to/socket ]
python3 -m token_meister.gssapi /path/to/key [ port ]
python3 -m token_meister.make /path/to/key user

Command line interface to CERN SSO methods

The script allows to get a token for a (public) OAuth2 client via the CERN SSO. The result is a JSON structure that should be further processed with jq or similar tools.

python3 -m daq_tokens.cern_sso --client atlas-tdaq-token --krb5 -o save
token=$(jq .access_token < save)

Now you can use the ${token} when accessing protected URLs:

curl -H "Authorization: Bearer ${token}" https://...

When the access token is expired, refresh it:

python3 -m daq_tokens.cern_sso --client atlas-tdaq-token --refresh $(jq .refresh_token < save) -o save

Other authorization grants:

python3 -m daq_tokens.cern_sso --client atlas-tdaq-token --browser -o save
python3 -m daq_tokens.cern_sso --client atlas-tdaq-token --password -o save

A new browser window will be opened for the --browser option, so you need a graphical environment (or a text mode browser...).

The --password method is strongly discouraged. Don't type your central CERN password into random scripts that ask for it.

dbe

Package: dbe
Jira: ATLASDBE

DVS GUI (graphical UI for DVS)

See also related DVS and TestManager packages.

dynlibs - Load shared libraries

This package is deprecated from tdaq-09-00-00 onwards.

Please use plain dlopen() or boost::dll instead. Note that unlike in this package, the boost::dll::shared_library object has to stay in scope as long as the shared library is used !

Example of possible replacement

#include <boost/dll.hpp>

double example()
{
   boost::dll::shared_library lib("libmyplugin.so",
                                  boost::dll::load_mode::type::search_system_folders |
                                  boost::dll::load_mode::type::rtld_now);
   // Get pointer to function with given signature
   auto f = lib.get<double(double, double)>("my_function")
   return f(10.2, 3.141);
}

emon

  • Connections between event samplers and monitors have been optimized. Existing configurations should be adjusted to benefit form that. Previously the NumberOfSamplers parameter has been used to define a number of samplers to be connected by all monitors of a group that uses the same selection criteria. In the new implementation this number defines a number of samplers that each individual monitor has to connect. That makes no difference for monitors that used to connect to a single sampler and do't form a group. For the monitors that share the same selection criteria, like for example the Global Monitoring tasks, this number should be changed to the old number divided to the number of the monitors in the group. For Athena monitoring the corresponding parameter of a JO file is called KeyCount.

genconfig

tdaq-10-00-00

c++: remove generation of __get_xxx_str() attribute/relationship methods; inline static string s_xxx can be used instead

HLTSV - HLT Supervisor

tdaq-10-00-00

If at run stop no DCMs are available (e.g. if the whole HLTSV farm crashed), the HLT supervisor will now time out in the RoIBStopped transition and continue the shutdown process. The remaining events in the HLTSV memory are lost.

The timeout in seconds is specified in HLTSVApplication.StopTimeout. It is 3 minutes by default and should be adjusted to the expected time it normally takes the HLT farm to process the remaining events after the L1 trigger was stopped.

ISPY

tdaq-09-05-00

New Methods on IPCPartion

These allow to retrieve the available CORBA interfaces (types) and objects of a given type, as well as checking if a given object is valid.

p = IPCPartition("...")
p.getTypes()
['mts/worker', 'rc/commander', 'rdb/writer', 'is/repository', 'rmgr/ResMgr', 'rdb/cursor', 'pmgpriv/SERVER', 'ipc/servant']

p.getObjects('rdb/writer')
{'RDB_RW': {'valid': True, 'name': 'RDB_RW_INITIAL', 'owner': 'rhauser', 'host': 'rhauser-inspiron', 'pid': 1947, 'time': 1664623815, 'debug': 0, 'verbose': 0}}

p.isObjectValid('RDB_RW', 'rdb/writer')
True

IS Attribute format() Method

This returns a formatter that can be called to get the desired string representation for the attribute (typically used for hex representation).

fmt = attr.format()
print(fmt.format(value))

Minimal MTS Receiver Interface

To create a MTS subcription and register a callback:

def callback(issue):
   print(issue)

import mts

sub = mts.add_receiver(callback, "initial", "*")

The last argument is an MTS subscription expression. The sub return value should be kept alive as the subscription will be removed if it is garbage collected. It can also be used to unsubscribe explicitly.

sub.remove_receiver()

The issue passed to the callback is a Python dictionary with all the content from an ers::Issue.

jeformat

Package: jeformat

mda

Small fix in mda_register script, updates for Python3.

MTS

tdaq-10-00-00

MTS library: synchronous subscription

Added synchronous subscription functionality (https://its.cern.ch/jira/browse/ATDAQCCMRS-41): call to add_receiver publishes subscripton information in MTS IS server and then waits untill all MTS workers (registered at that moment) explicitly notify the subscribed that subscription is registered, or untill the specified timeout expires (milliseconds).

Functionality is implemented in both Java and C++ libraries. The changes are backward-compatible: the synchronization is only enabled when add_receiver is called with additional third parameter, the timeout value. Most of the clients create subscription well in advance before receiving first message, so this functionality is mostly addressing those who creates a subscription and may expect messages to arrive very shortly after (CES recovery case which is based on MTS messages exchange).

A reasonable timeout would be 50ms, though normally workers are notified within few milliseconds.

mts2splunk utility

Added two new command line parameters, allowing to filter out messages with particular msgId or appId.

OKS

tdaq-10-00-00

  • set pull.rebase=true when clone oks repository to avoid misleading merge commits by users

owl

tdaq-10-00-00

  • Remove deprecated OWLMutex and OWLMutexRW classes. Use std::mutex and std::shared_mutex instead.
  • Remove deprecated OWLThread. Use std::thread instead.
  • Remove deprecated OWLCondition. Use std::condition_var instead.

P-BEAST

tdaq-10-00-00

  • introduce v5 data format with more efficient data compaction (use delta compaction for integer data types)
  • add bstconfig (new config plugin) and use it for pbeast service operational monitoring (simplifies integration with k8s services), see ADAMATLAS-419
  • the TDAQ_PBEAST_SERVER_URL and TDAQ_PBEAST_RECEIVER_URL process environment variables can be used to specify the server and web receiver URLs; the PBEAST_SERVER_BASE_URL is deprecated

rn (Run Number)

tdaq-09-01-00

Jira: ADTCC-242

Store start of run timestamp and run duration with nanoseconds resolution. Store TDAQ release name. If oks repository is used, store oks config version into run number database and tag oks repository by run number / partition name.

swrod

IS Information Update

Several new attributes have been added to the ROBStatistics IS information type: * enabled is set to false when the corresponding ROB is stoplessly removed from the ongoing run, true otherwise. * latePackets - per ROB counter of data packets that arrive to the SW ROD when the corresponding fragments have been already built due to the timeout. The same counter has been added to the LinkStatistcis IS class.

A new gcDeletedFragments attribute has been added to the HLTRequestStatistics IS class. This is a counter of ROB fragments which were removed by the garbage collector rather than by a normal Clear request.

If the DF IS server contains the IS objects published by the previous version of the SW ROD, they must be removed to release publication of the new objects. This can be done using the is_rm command, e.g.:

is_rm -p <partition name> -n DF -r "swrod.*"

Configuration schema changes

  • EnableGarbageCollection attribute has been added to the SwRodHLTRequestHandler class. If its value is set to 1 (default) then SW ROD will remove the oldest ROB fragments from the HLT buffer when the buffer gets full.

  • UnsubscribeDisabledLinks attribute has been added to the SwRodFragmentBuilder class. If its value is set to 1 (default) then SW ROD will unsubscribe from the stoplessly removed input links, otherwise just mark them as disabled and discard all data received through them.

  • MaxReorder attribute has been added to the SwRodHLTRequestHandler class. It defines the maximum size of latest L1 ID derandomising map, that is used to optimize HLT request handling.

  • DropCorruptedPackets attribute has been moved from the SwRodGBTModeBuilder class to the SwRodFragmentBuilder. This way it was made available to the SwRodFullModeBuilder as well.

  • Two attributes have been added to the SwRodDataChannel class.

    • TTCControllerName - defines the name of the closest TTC segment controller. Used to detect when TTC Restart operation is ongoing. When SW ROD application is executing PrepareForRun Run Control transition it checks the state of the corresponding controller. If the controller is in the RUNNING state then the SW ROD application assumes that it is just being restarted. In another case it is assumed that the TTC Restart procedure is taking place. By default this parameter is set to the "RootController" string.
    • UpdateECRCounter - If this parameter is set to true the SW ROD application sets ECR counters of the FELIX cards it is subscribed to the last known ECR value during TTC Restart, Stopless Recovery and Resynchronise procedures.
  • Three attributes have been added to the SwRodFragmentBuilder class.

    • PacketDumpPath - The name of a directory to store files with dumps of corrupted packets. A corrupted packet will be dumped either if L1 ID can not be reliable extracted from it or the DropCorruptedPackets parameter is set to true.
    • PacketDumpLimit - The maximum number of packets to dump per run.
    • LinkLoggingLimit - The maximum number of data corruption incidents reported for a single E-Link
  • A new DataReceivingTimeout attribute has been added to the SwRodGBTModeBuilder class. It defines a timeout in milliseconds for ROB fragments building. If the given number of milliseconds is passed after receiving the first data chunk for a particular ROB fragment this fragment will be considered as built and will be passed to the fragment consumers even if it does not contain data chunks from all the E-links associated with the given ROB.

  • A new DeferProcessing attribute has been added to the SwRodCustomProcessor class. If it is set to true the processing will be applied only when serialization of the ROB fragment is requested. This happens for example when the fragment is about to be written to a file or been sent over the network. This may be used to reduce computing resources for the fragments that require heavy processing but are rarely requested by the HLT. Default value of this attribute is false.

  • A new ProfileExecution attribute has been added to the SwRodCustomProcessor class. If it is set to true the processor will keep a record of total time of the custom processing execution and will print it to the standard output when SW ROD is terminated. Default value is false.

GBT Fragment Building Timeout

This version implements fragment building timeout for GBT fragment building algorithm. One can specify the timeout value via the new attribute of the SwRodGBTModeBuilder class called DataReceivingTimeout. This attribute contains a number of milliseconds to wait after receiving the first data chunk for a particular ROB fragment to consider this fragment as built even if not all data chunks have been received. By default this attribute is set to zero, which disables the timeout.

Custom Plugin Test application

A new application that can be used for validation and profiling of a custom plugin has been added. The application is called swrod_custom_plugin_test and can be used in the following way: * For the first time it has to be given five parameters: a name of the data file to be used as data source, a name of the shared library that implements the plugin to be tested and the names of the three custom functions which this plugin implements. Optionally one can use -o option to save the new configuration to the given Json file. * If the Json configuration file has been produced the test application can be started with this file as a sole input parameter using -i command line option.

transport

Clients using classes from the transport package should look into more modern network libraries like boost::asio until the C++ standard contains an official network library.