tdaq-09-00-00
Introduction
The ATLAS TDAQ software version tdaq-09-00-00
has been released
on 26th March 2020.
Please note that tdaq-09-00-01
is also available, but is not a
newer version of tdaq-09-00-00
, but the Python 3 variant ! This is
to be used for migration to Python 3 by detectors and
off-line software.
Availability and Installation
Outside of Point 1 the software should be used via CVMFS. It's official location is
/cvmfs/atlas.cern.ch/repo/sw/tdaq/tdaq/tdaq-09-00-00/
At Point 1 the software is as usual available at
/sw/atlas/tdaq/tdaq-09-00-00/
The software can also be installed locally via ayum.
git clone https://gitlab.cern.ch/atlas-sit/ayum.git
source ayum/setup.sh
Modify the prefix
entries in the yum repository files in ayum/etc/yum.repos.d/*.repo
to point to the desired destination.
ayum install tdaq-09-00-00_x86_64-centos7-gcc8-opt
Configurations
The release is available for the following configurations:
- x86_64-centos7-gcc8-opt
- x86_64-centos7-gcc8-dbg
- x86_64-centos7-gcc9-opt
- x86_64-centos7-gcc9-dbg
External Software
LCG_97
The version of the external LCG software is LCG_97.
TDAQ Specific External Software
Package | Version |
---|---|
cmzq | 3.0.2 |
zyre | 1.1.0 |
libfabric | 1.6.2 |
nlohmann/json | 2.1.1 |
pugixml | 1.9 |
ipbus-software | 2.7.2 |
microhttpd | 0.9.59 |
mailinglogger | 5.0.0 |
netifaces | 0.10.9 |
paramiko[gssapi] | 2.7.1 |
Removed Packages
The following packages have been removed since tdaq-08-03-01
:
- FarmTools
- OnlinePolicy (the
getcmtconfig
script moved to DAQRelease - ROx - all packages related the SW ROD prototype
CES
Package: CES
Jira: ATDAQCCCES
Live documentation about recoveries and procedure can be found here.
Changes in recoveries and/or procedures:
-
Stopless recovery: handle failures on the detector side (ATDAQCCCES-149):
In case no channels can be recovered, the
rc::HardwareRecovered
message can be sent with an empty list of recovered channels. In that case,CHIP
will notify of the failure in recovery but the trigger will be released. -
Stop-less removal and recovery adapted to the new
SWROD
(ATDAQCCCES-137); HLT
recoveries adapted to the newHLT
framework (ATDAQCCCES-138).
Internal changes:
- The
ESPER CEP
engine has been updated to version 8 (ATDAQCCCES-133); - Fixed scalability issue observed after the introduction of
ESPER
8 (ATDAQCCCES-150); - Publishing information about the status of the Java Virtual Machine (ATDAQCCCES-144);
- Proper handling of
ERS
fatal messages (ATDAQCCCES-147); - Fixing an issue with the
BAD_HOST
problem not being cleared (ATDAQCCCES-143); - Executing tests in case of transition timeout (ATDAQCCCES-140);
- Adapting to changes in
DAL
(new implementation of template applications and segments); - Adapting to changes in
DVS/TestManager
(new implementations).
CMake Changes
The main TDAQ.cmake
file is compatible with CMake 3.16.3 and also
still works with CMake 3.14.3.
This version of CMake complains loudly if there is no top-level project()
command in the main CMakeLists.txt
file. This was always required but
so far silently ignored. To make CMake happy the following changes
should applied:
Work Area
For a user work area there is a small addition in the top-level CMakeLists.txt
file needed. It should look like this:
cmake_minimum_required(VERSION 3.14.0)
project(work) # the new line
find_package(TDAQ)
include(CTest)
tdaq_work_area()
Otherwise you will get a large warning with CMake 3.16, though things will still work.
Downstream Projects
Downstream projects should have a top-level CMakeLists.txt
file that looks as follows:
cmake_minimum_required(VERSION 3.14.0)
project(myproject VERSION 1.0.0)
find_package(TDAQ)
set(TDAQ_DB_PROJECT myproject)
tdaq_project(myproject 1.0.0 USES tdaq 9.0.0)
Jers - Java ERS
tdaq-09-00-01
Use of TDAQ_ERS_TIMESTAMP_FORMAT
with default value "yyyy-MMM-dd HH:mm:ss,SSS z"
to allow defining timestamp format like in ERS, including milliseconds.
The format used before was "DDD MMM dd HH:mm:ss z yyyy"
e.g. Mon Mar 30 15:06:28 CEST 2020
.
tdaq-09-00-00
ProcessManager
Package: ProcessManager
The ProcessManager
twiki can be found here.
Changes in public APIs:
-
Added information about the availability of process out/err files:
The
PMGProcessStatusInfo
structure has been extended with two additional attributes (out_is_available
anderr_is_available
) indicating whether the concerned application has produced anyout
orerr
log. -
Using smart pointers in client library:
All the references to
Process
instances are encapsulated in shared pointers.
Internal changes:
- Not deleting or changing permissions of err/out files if they are not regular files (this covers the case in which the err and/or out streams are sent to
/dev/null
).
RCUtils
Package: RCUtils
- rc_error_generator: added the generation of
ERS
issues related to the stop-less removal/recovery forSWROD
; - Added utility class to monitor several metrics of the Java Virtual Machine: the new class (
JVMMonitor
) is defined in theRCUtils.tools
package. TheRCUtils.jar
jar should be added to the classpath; - Several improvements to the DAQ efficiency tools.
RunControl
Package: RunControl
Jira: ATDAQCCRC
This is the link to the main RunControl twiki.
Enhancements:
-
The Java library has been extended with the possibility to create Java Run Control applications (ATDAQCCRC-40). It is now possible to have Run Control applications in the three main languages used in TDAQ: C++, Java and Python.
An example application and the related script to start it are provided.
Additional details can be found here.
Changes in utility and/or tools:
-
rc_sender (see the online help for details about the syntax to be used):
The
STARTAPP/STOPAPP/RESTARTAP/TESTAPP/ENABLE/DISABLE
commands support regular expressions;The same command can be sent to multiple applications using a regular expression for the application name.
Public changes in APIs:
-
Regular expressions enabled for the
STARTAPP/STOPAPP/RESTARTAP/TESTAPP/ENABLE/DISABLE
commands:In the concerned command objects, application or component names can be replaced by regular expressions. In order to be treated as regular expressions, the names must end with the
//r
characters. -
The
Resynch
command has been extended to include the extendedL1ID
(and not only theECR
counter); - Added
initAction
to user routines (ATDAQCCRC-191).
Internal changes:
- Adding the
SWROD
to the computation of the detector mask (ATDAQCCRC-46); - Adapting to changes in the
PMG
library (smart pointers fordaq::pmg::Process
instances); - After a reload, tests for apps out of membership and not up are reset;
- Adapting to changes in
DAL
(new implementation of template applications and segments); - Adapting to changes in
DVS/TestManager
(new implementations); Boost
shared mutex replaced withstd
version.
SFOng
- replaced deprecated tbb::tbb_hasher with std::hash
- replaced deprecated tbb::atomic with std::atomic
- fixed: starting a new run without going to shutdown (stop/start) could produce Late events at the beginning of the new run for a transition period
Test Manager
tdaq-09-00-00
A complete re-implementation of the former Test Manager package used in Run 2, addressing scalability (threading), performance :muscle: and code maintenance issues. New design was presented here: https://indico.cern.ch/event/780731/contributions/3250021/attachments/1769325/2880324/DVS_TM_LS2.pdf
:point_up: Schema for describing Tests (Test4Class, Test4Object) did not change, in the schema for TestPolicies there are few changes in TestFailureAction configuration as described here: https://indico.cern.ch/event/800655/contributions/3327341/attachments/1801247/2938116/TM_CC_25Feb2019.pdf
Each Failure may have (generic) TestFailureAction associated, and the action and action parameters are the attributes of the TestFailureAction class. There are no more specialized classes like RebootAction, all types of actions (e.g. 'reboot', 'execute', 'test') may be specified as string (enumeration) in TestFailureAction::action attribute, i.e. it is possible to add new actions when needed. The parameters of an action is configured as Json { param1: value1} string in TestFailureAction::parameters attribute (e.g. for 'reboot' action), or taken from associated Executable object (e.g. for 'exec' action). After substitution of parameters (e.g. #this.RunsOn.UID), the action and a json string with parameters are passed as part of TestResult to the client (DVS, RC or CHIP) who is respobsible for execution of the action. An example of an action ('test'):
<obj class="TestFailureAction" id="TestExtraComputer">
<attr name="action" type="enum">"test"</attr>
<attr name="parameters" type="string">"{ Component: #this.RunsOn.UID; Scope: diagnosis }"</attr>
...
</obj>
Performance of the new implementation was studied in this presentation: https://indico.cern.ch/event/817692/contributions/3413542/attachments/1838805/3013693/TMperfCC060519.pdf
TriggerCommander
Package: TriggerCommander
Clients
A backward compatible change (for clients) has been introduced in the
TriggerCommander/MasterTrigger.h
interface. The hold()
method
has been extended with a boolean flag extended_lvl1id
which defaults
to false
.
The method will return the last ECR as it did in the past. When the
new flag is set to true
it will instead return the full Level 1 ID
(from which the ECR can be extracted as the uppermost 8 bits).
Implementations of the MasterTrigger interface
An implementation of the MasterTrigger
interface has to modify
its overloaded method:
class X : public MasterTrigger {
...
uint32_t hold(const std::string& dm, bool extended_lvl1id = false) override;
...
};
config
tdaq-09-00-00
Available since tdaq-08-03-01
Remove obsolete get_class_info() method using MetaDataType enum
Jira: ADTCC-184
Use well structured method instead:
const daq::config::class_t& get_class_info(const std::string& class_name, bool direct_only = false);
dal
Changes since last release:
SW repository generation utility was removed
Remove dal_create_sw_repository. The create_repo.py has to be used instead. See CMake TWiki for more information.
tdaq-09-00-00
Available since tdaq-08-03-00
New template-based segments and applications config
Jira: ADTCC-177
The generated DAL classes BaseApplication and Segment replaced old AppConfig and SegConfig ones.
Updated partition algorithms:
std::vector<const BaseApplication *> Partition::get_all_applications(std::set<std::string> * app_types = nullptr,
std::set<std::string> * use_segments = nullptr, std::set<const Computer *> * use_hosts = nullptr) const;
const Segment * Partition::get_segment(const std::string& name) const;
The segment algorithms:
std::vector<const BaseApplication *> Segment::get_all_applications(std::set<std::string> * app_types = nullptr,
std::set<std::string> * use_segments = nullptr, std::set<const Computer *> * use_hosts = nullptr) const;
const BaseApplication * Segment::get_controller() const;
const std::vector<const BaseApplication *>& Segment::get_infrastructure() const;
const std::vector<const BaseApplication *>& Segment::get_applications() const;
const std::vector<const Segment*>& Segment::get_nested_segments() const;
const std::vector<const Computer*>& Segment::get_hosts() const;
const Segment * Segment::get_base_segment() const;
bool Segment::is_disabled() const;
bool Segment::is_templated() const;
void Segment::get_timeouts(int & actionTimeout, int & shortActionTimeout) const;
The application algorithms:
const Computer * BaseApplication::get_host() const;
const daq::core::Segment * BaseApplication::get_segment() const;
std::vector<const daq::core::Computer *> BaseApplication::get_backup_hosts() const;
const daq::core::BaseApplication * BaseApplication::get_base_app() const;
std::vector<const daq::core::BaseApplication *>
BaseApplication::get_initialization_depends_from(const std::vector<const daq::core::BaseApplication *>& all_apps) const;
std::vector<const daq::core::BaseApplication *>
BaseApplication::get_shutdown_depends_from(const std::vector<const daq::core::BaseApplication *>& all_apps) const;
The algorithms on Segment and BaseApplication objects may only be called, if such objects in turn were instantiated by get_all_applications() or partition's get_segment() DAL algorithms.
Algorithms changes
Jira: ADTCC-209
Several algorithms were never used and deleted or reorganized
Modified algorithms
- BaseApplication::get_output_error_directory() is renamed to Partition::get_log_directory() because the log directory does not depend on the application
- BaseApplication::get_info() remove partition, segment and host parameters
- Segment::get_timeouts() remove partition and database parameters
- SubstituteVariables remove database parameter in the constructor
Removed algorithms
Several algorithms were obsolete and deleted, or not used by user code and removed from public API:
- ComputerProgram::get_parameters()
- BaseApplication::get_application()
- BaseApplication::get_some_info()
- ResourceBase::get_applications()
- Variable::get_value(tag)
dbe
-
Improved filtering and searching ATLASDBE-144:
The filter (or the auto-completion) in drop-down menus now works on tokens (i.e., if
Computers
have names likepc-tdq-onl-*
orpc-tdq-tpu-*
, in order to get only tpu-like nodes, typing tpu is enough). -
Fixing problem building table after the move to
Qt5
ATLASDBE-241; - Improved sorting for table ATLASDBE-246;
- Several improvements to the schema editor ATLASDBE-239.
dcm
tdaq-09-00-00
New features include: dfinterfaceDCM: added an operationTimeOut issue when the tryGetNextUntil function is not able to retrieve an event. This is required to fix the problem of Jira ADHI-4728. Processor.cxx: increased the IDLE timeout. Required for testing for the Jira ADHI-4770.
DVS (Diagnostics and Verification Framework)
tdaq-09-00-00
Replacement of CLIPS
Full re-implementation of the expert-system engine: CLIPS (an old forward-chainging C rule engine) was replaced by a custom forward-chaining engine where rules can be defined as C++ objects (relying on C++11 features like lambda). The rules are compiled and linked to the executable. The present set of rules is to handle Test policies and dependencies: rules.
TestFailure follow-up actions
A change in TestRepository schema related to description of TestFailureActions
is described in TM release notes and in main DVS/TM twiki. An Action is prepared by DVS from this configuration and added to the test result passed to client (RC, CHIP).
limitations and future developments
Make possible to load rules from user-supplied .so files at runtime, just passing file names as parameter to DVS engine.
DVS Tests
tdaq-09-00-00
Added generate_core_file.sh
script which can be used as a TestFailure follow-up action to automatically generate a core file for a misbehaving application process (to allow debuging rare conditions with applications).
Usage: generate_core_file.sh [-v] [-h] [-d <directory_name>] -p <partition_name> -a <app_name> [-m mail_address]"
Parameters for particular application are deduced from the configuration in runtime, see DVS/TestManager documentation for more details on configuring this in the test repository: https://twiki.cern.ch/twiki/bin/view/Atlas/DaqHltTestManager#How_to_Parametrize_Tests
dynlibs - Load shared libraries
This package is deprecated from tdaq-09-00-00 onwards.
Please use plain dlopen()
or boost::dll
instead. Note that unlike in this package, the boost::dll::shared_library
object has to stay in
scope as long as the shared library is used !
Example of possible replacement
#include <boost/dll.hpp>
double example()
{
boost::dll::shared_library lib("libmyplugin.so",
boost::dll::load_mode::type::search_system_folders |
boost::dll::load_mode::type::rtld_now);
// Get pointer to function with given signature
auto f = lib.get<double(double, double)>("my_function")
return f(10.2, 3.141);
}
HLTSV - HLT Supervisor
tdaq-09-00-00
Changing HLT Prescales in a Pre-loaded Partition
The HLTSV now supports pre-scale changes in pre-loaded data mode. This should make it easier to test this functionality without the CTP. This is for HLT experts only.
The functionality is enabled when the TriggerConfiguration.TriggerCoolConnection
attribute (a string) is not empty. This should be either a COOL alias
(for the typical use at Point 1 with the CTP), or a test database, e.g. sqlite.
Never use this with a production database for a test !
- Initialize the sqlite database
% TestHltPsk2CoolWriting hltPrescaleCool.db
3
q
- Set the TriggerCoolConnection to a string like the following (adapting the path):
<attr name="TriggerCoolConnection" type="string" val="sqlite://;schema=/scratch/work/rhauser/cmtest/cool/hltPrescaleCool.db;dbname=CONDBR2"/>
- Start your partition
% trg_command_trigger -p
- Check the log output of the HLTSV, there should be some informal lines, otherwise check the error logfile.
% TestHltPsk2CoolWriting hltPrescaleCool.db
1
0
q
You should now see the new entries that you set via the trg_command_trigger
utility.
MTS
tdaq-09-00-00
Since tdaq-07-01-00 release, only few bug fixes and 2 little additions: - allow specifying ERS context line number in subscription syntax, e.g.
(app=HLTMPPU* and context(line)!=666)
- a Java ulility (available in erssender.jar) to send arbitrary ERS messages like
TDAQ_PARTITION=mypart TDAQ_APPLICATION_NAME=myapp java -cp erssender.jar:ers.jar utils.ERSSender ERROR mymsg-id "My message text"
OKS
tdaq-09-00-00
OKS data file format
Jira: ADTCC-185
New OKS format since tdaq-08-03-01
Store values inside tags
Store attribute and relationship data inside values of tags
Format for single value attributes
<attr name="xxx" type="yyy" val="zzz"/>
Format for multi value attributes
<attr name="xxx" type="yyy">
<data val="zzz1"/>
<data val="zzz2"/>
</attr>
Format for 0..1 and 1..1 relationships
<rel name="xxx" class="yyy" id="zzz"/>
Format for 0..N and 1..N relationships
<rel name="xxx">
<ref class="yyy1" id="zzz1">
<ref class="yyy2" id="zzz2">
</rel>
Skip empty data
Do not store attributes with values equal to empty initial and empty relationships
Compatibility and data conversion
The changes are backward compatible. New OKS library is able to read old "extended" and "compact" data formats. For conversion open data file stored in old format using OKS Data Editor or DBE and save it. Do not mixture old and new formats in the same file, when update file in a text editor.
PartitionMaker no longer uses FarmTools
Package: Partition Maker
When the partition maker script (e.g. pm_part_hlt.py
) generates a partition
with hosts that are not yet in the database, it will try to execute various
commands on the host. In the past it used the FarmTools
package thas has
been removed from this release.
From this release on it will instead use the paramiko Python package to execute the commands. It will first try a connection with ssh keys, then with GSSAPI (i.e. Kerberos). There is no longer any parallelism. This may slow down the generation of hosts file for large farms.
swrod
-
By default ROBs and E-Links IDs in a SW ROD OKS configuration are shown in hexidecimal format.
-
Common SW ROD configuration objects, which don't normally require customization have been placed to the daq/sw/swrod-common.data.xml OKS configuration class that is installed to the installed/share/data area of the TDAQ release. Any SW ROD OKS configuration is advised to use these objects instead of creating custom ones unless any parameters modification is required.
-
GBTModeBuilder and FullModeBuilder algorithms now automatically detect if the respective SW ROD application uses TTC data handler for getting L1 Accept packets from FELIX. If that is the case the algorithms will run in TTC-aware mode, otherwise they will be data-driven. Contrary to the previous release there is no need to change the value of the Type parameter of the ROBFragmentBuilder class in the OKS configuration to change the fragment building mode. In the new implementation one should either link an instance of the SwRodL1AInputHandler with the SwRodConfiguration object via the L1AHandler relationship to use TTC-aware variant of the chosen algorithm or otherwise to leave this relationship empty to use the fragment building algorithm in data-driven mode.
-
Each detector custom plugin may now provide a function for data integrity validation following the example given below. If a plugin provides such a function the function name shall be set to the DataIntegrityChecker attribute of the SwRodCustomProcessingLib OKS configuration object that describes this plugin.
cpp extern "C" std::optional<bool> dataIntegrityChecker(const uint8_t * data) { if (contains_checksum(data)) { uint8_t checksum = get_checksum(data); return checksum == calculate_checksum(data) ? true : false; } return std::nullopt; }
-
The package provides so called Felix Emulator that can be used to send input data to an arbitrary SW ROD Application via Netio protocol. OKS file data/FelixEmulatorSegment.data.xml shows how to configure Felix Emulator. This example provides Felix Emulator for the SW ROD Application defined in the data/SwRodSegment.data.xml file. The idea is that the Emulator uses internal data generators which create L1A and data packets and the Emulator sends these packets to the SW ROD Application. The Emulator is mplemented by the Sw ROD framework using a special plugin that can publish generated data via Netio protocol. The generated data packets can be customized either by modifying an existing implementation of the DataInput interface provided by the test/core/InternalDataGenerator.h(cpp) files or by providing another custom plugin that declares and implements a new class inheriting the swrod::test::InternalDataGenerator and overriding generatePacket() virtual function.
Cross Compilation
This release has all changes in the necessary CMake files to support
some basic cross-compilation. This has only been tested with the
aarch64
architecture (ARM) and using the same CentOS 7 version
on the target as on the host.
Only a small subset of the LCG software is available for ARM.
A reasonable subset of tdaq-common
and tdaq
can be cross-compiled
so that a run-controlled application can be developed for the ARM target
sytem.
For more details look at the basic cross-compilation setup and the TDAQ cross-compilation project.