tdaq-10-00-00
The ATLAS TDAQ software version tdaq-10-00-00
has been released
on 6th January September 2022.
Availability and Installation
Outside of Point 1 the software should be used via CVMFS. It's official location is
/cvmfs/atlas.cern.ch/repo/sw/tdaq/tdaq/tdaq-10-00-00/
At Point 1 the software is as usual available at
/sw/atlas/tdaq/tdaq-10-00-00/
The software can also be installed locally via ayum.
git clone https://gitlab.cern.ch/atlas-sit/ayum.git
source ayum/setup.sh
Modify the prefix
entries in the yum repository files in ayum/etc/yum.repos.d/*.repo
to point to the desired destination.
ayum install tdaq-10-00-00_x86_64-centos7-gcc11-opt
In case the LCG RPMs are not found, add this to etc/yum.repos.d/lcg.repo:
[lcg-repo-102b]
name=LCG 102b Repository
baseurl=http://lcgpackages.web.cern.ch/lcgpackages/lcg/repo/7/LCG_102b/
enabled=1
prefix=[...your prefix...]
Configurations
The release is available for the following configurations:
- x86_64-centos7-gcc11-opt (default at Point 1)
- x86_64-centos7-gcc11-dbg (debug version at Point 1)
- x86_64-centos9-gcc11-opt
- x86_64-centos9-gcc11-dbg
- x86_64-centos7-gcc12-opt (experimental)
- x86_64-centos7-gcc12-dbg (experimental)
- aarch64-centos7-gcc11-opt (experimental)
The CentOS 9 variant recognizes all of CentOS Stream, Rocky Linux, Alma Linux, RedHat Enterprise Linux as equivalent. It is provided since the plan is to upgrade the OS in Point 1 to Alma Linux 9 in the next winter shutdown (2023/24). It may become available at Point 1 later this year as the new OS is tested.
Microarchitecture
For the first time the TDAQ release is compiled with the
-march=x86_64-v2
flag. This enables instructions up to
SSE 4.2 (but no AVX) and may be incompatible with older
Intel CPUs (approximately Nehalem or older). You will
see an illegal instruction exception on those machines.
Note that this option is the default on RedHat Enterprise 9.x for all software.
External Software
LCG_102b
The version of the external LCG software is LCG_102b.
CORAL and COOL - 3.3.13
These two packages are no longer part of LCG. They are included in tdaq-common and their use should be transparent for users.
The official targets to use are:
- CORAL::CoralBase
- CORAL::CoralServerBase
- CORAL::CoralMonitor
- CORAL::CoralStubs
- CORAL::CoralCommon
- CORAL::CoralServerProxy
- CORAL::CoralKernel
- CORAL::CoralSockets
TDAQ Specific External Software
Package | Version | Requested by |
---|---|---|
cmzq | 4.2.1 | FELIX |
zyre | 2.0.1 | FELIX |
libfabric | 1.11.0 | FELIX |
colorama | 0.4.4 | DCS |
opcua | 0.98.13 | DCS |
parallel-ssh | 2.10.0 | TDAQ (PartitionMaker) |
pugixml | 1.9 | L1Calo, L1CTP |
ipbus-software | 2.8.4 | L1Calo, L1CTP |
microhttpd | 0.9.73 | TDAQ (pbeast) |
mailinglogger | 5.1.0 | TDAQ (SFO) |
netifaces | 0.11.0 | TDAQ |
Twisted | 22.4.0 | TDAQ (webis_server) |
urwid | 2.1.2 | TDAQ |
jwt-cpp | v0.6.0 | TDAQ |
Flask | 2.1.3 | TDAQ |
Docker and Apptainer Images
The docker images used for building and testing the TDAQ software are available here:
docker pull gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos7
docker pull gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos9
docker pull gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:aarch64-centos7
Run it like this:
docker run -it --rm -v /cvmfs:/cvmfs:ro,shared gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos7
The corresponding apptainer/singularity images are here:
/cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos7
/cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos9
/cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:aarch64-centos7
Run it like this:
apptainer shell -B /cvmfs /cvmfs/unpacked.cern.ch/gitlab-registry.cern.ch/atlas-tdaq-software/tdaq_ci:x86_64-centos7
Setup Options
In the following we assume some alias like
alias cm_setup='source /cvmfs/atlas.cern.ch/repo/sw/tdaq/tools/cmake_tdaq/bin/cm_setup.sh'
- The
cm_setup --list
option will show the available releases including nightlies - The
cm_setup --clean ...
option will bypass all testbed specific setup. This is useful if you want to use testbed hardware but be completely independent from the existing infrastructure. You have to set your ownTDAQ_IPC_INIT_REF
path to start a private initial partition, if needed. - The
cm_setup
script takes a short version of the CMTCONFIG build configuration as argument. E.g.cm_setup nightly dbg
will setupx86_64-centos7-gcc11-dbg
cm_setup nightly gcc12
will setupx86_64-centos7-gcc12-opt
cm_setup nightly gcc12-dbg
will setupx86_64-centos7-gcc12-dbg
- There is no short cut for setting the architecture or the OS.
BeamSpotUtils
- Updates to support new HLT histogram naming convention for Run3.
- Add support for new track-based method.
- Improvements and refactoring to support easier testing.
CES
Package: CES
Jira: ATDAQCCCES
Live documentation about recoveries and procedure can be found here.
Changes in recoveries and/or procedures:
- When the
TDAQ_CES_FORCE_REMOVAL_AUTOMATIC
environment variable is set totrue
, then no acknowledgment is asked by the operator for a stop-less removal action to be executed (regardless of the machine or beam mode); - Stop-less removal involving the SwRod: the reporting application now always receives the initial list of components back (and not only the valid components, as it was before);
- After each clock switch, a new command is sent to AFP and ZDC;
- The
BeamSpotArchiver_PerBunchLiveMon
application it not started at warm stop anymore; - The
DCM
is now notified when aSwROD
dies or is restarted (in the same way as it happens with a ROS or a SFO).
Internal changes:
- Following changes in the
MasterTrigger
interface; - Fixed bug not allowing the auto-pilot to be disabled in some conditions;
- Fixed bug causing the ERS
HoldingTriggerAction
message to not be sent.
Igui
The Igui
twiki can be found here.
tdaq-10-00-00
A new revised version of the Main Command
panel is introduced and selected by default. A description of the new panel can be found here.
The old legacy version is still available and will be removed in later Igui
versions.
MonInfoGatherer
Alternative implementation for merging non-histogram data (ADHI-4842). This
should resolve most of the timing issues we have seen with DCM data in the
past. It is enabled by default but can be disabled with ISDynAnyNG
configuration parameter (see ConfigParameters.md
).
ProcessManager
The ProcessManager
twiki can be found here.
RCUtils
Package: RCUtils
RunControl
Package: RunControl
Jira: ATDAQCCRC
This is the link to the main RunControl twiki.
SFOng
- added: periodic update of free buffer counter (IS) even when no data are received was: "0" free buffers was published when no data received giving the wrong impression that SFOng was about to assert backpressure.
TDAQExtJars
Package: TDAQExtJars
TriggerCommander
Package: TriggerCommander
Clients
A breaking change (for clients) has been introduced in the
TriggerCommander/MasterTrigger.h
interface. The hold()
method
has a new signature:
HoldTriggerInfo hold(const std::string& dm = "");
The extended_lvl1id
boolean flag has been removed, and the method now returns
a data structure holding the current ECR value and the last valid extended L1ID.
Implementations of the MasterTrigger interface
An implementation of the MasterTrigger
interface has to modify
its overloaded method:
class X : public MasterTrigger {
...
HoldTriggerInfo hold(const std::string& dm) override;
...
};
The HoldTriggerInfo
data structure holds information about the current ECR value
and the last valid extended L1ID.
beauty
nightly
Remove automated setting of pBeast server
The initial Beauty
implementation provided code automatically setting the pBeast server based on the values of environmental variables. This logic proved to be weak and not easily maintainable:
* especially in testbed, the pBeast server name has changed many times
* overloading the pBeast environmental variable for the authentication method for setting the server name is strictly speaking incorrect.
The new implementation removes completely this detection code. It is instead responsibility of the user to always provide a server name to the Beauty
constructor.
Old code relying on the previous implicit mechanism will fail:
>>> import beauty
>>> beauty.Beauty()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __init__() missing 1 required positional argument: 'server'
Control pBeast authentication method
Different pBeast servers require different authentication methods. Currently two methods are support: * no authentication required * authentication via an existing Kerberos token
The authentication method to be used can be controlled via a dedicated environmental variable (PBEAST_SERVER_SSO_SETUP_TYPE
) or via the library API.
The latter method is now exposed in the Beauty
interface. The Beauty.__init__
method accepts a keyword argument cookie_setup
. Valid values are:
* None
→ the default behaviour of the pBeast library will be used. The environment variable will be respected, if set
* Beauty.NOCOOKIE
→ no authentication required
* Beauty.AUTOUPDATEKERBEROS
→ authentication via Kerberos token
>>> import beauty
>>> b = beauty.Beauty('http://someserver', cookie_setup=beauty.Beauty.NOCOOKIE)
coca
tdaq-10-00-00
- Tag
coca-03-15-05
- Add support for shema version 4 (ADHI-4852)
- Tag
coca-03-15-03
(should make as a patch into tdaq-09-04-00) - Fix for Python wrapper RemoteFile.size method to return integer
- Tag
coca-03-15-02
(should make as a patch into tdaq-09-04-00) - Fix handling of NULL FILE_SIZE in DBStat.
coldpie
- add few missing methods to Folder and FolderSet wrappers
- Folder and FolderSet now have Python base class HvsNode which is a wrapper for cool::IHvsNode
config
tdaq-10-00-00
- many performance improvements, see ADTCC-284 epic for details
- methods using explicit prefetching parameters can always be effective even when an object is preloaded into implementation cache; over-prefetching is protected by the configuration implementation
- add a parameter to print nested aggregated object
- add method
time()
returning timestamp of an object modification
dal
tdaq-10-00-00
- implement ResourceBase.DependsOn algorithm for auto-disabling, see ATDSUPPORT-399
- add "/bin" to PATH process environment variable generated by DAL algorithms
- remove singleton protecting
get_partition()
algorithm from over-prefetching
DAQ Tokens
The various Python based token_meister
servers
have been replaced by a single C++ version.
The standard deployment for a local Unix domain socket:
token_meister /path/to/private.key [ /path/to/socket ]
token_meister --local /path/to/private.key [ /path/to/socket ]
The server using GSSAPI deployment requires
a Kerberos keytab for the atdaqjwt
service.
cern-get-keytab --service atdaqjwt -o token.keytab
export KRB5_KTNAME=FILE:$(pwd)/token.keytab
token_meister --gssapi /path/to/private.key [ port ]
For interactively creating a token (for testing and impersonating a user):
token_meister --make /path/to/private.key user
For internal timing test:
token_meister --time /path/to/private.key user
Additional options for all versions:
--hash=<HASHNAME>
where HASHNAME
is a valid OpenSSL name
for a hash function (e.g. 'SHA256'). To be
backward compatible with tdaq-09-04-00 use
--hash=md5
.
The binary is statically linked against the stdc++ library and depends otherwise only on system libraries. It can also be compiled independent from the TDAQ software (see the standalone directory). This means the binary can be just copied to a server and run if all library dependencies are met.
The old Python based servers are still available. They do not depend on any other TDAQ software which can be useful in a system deployment.
They now live in their own token_meister
package
and can be used like this:
python3 -m token_meister.local /path/to/key [ /path/to/socket ]
python3 -m token_meister.gssapi /path/to/key [ port ]
python3 -m token_meister.make /path/to/key user
Command line interface to CERN SSO methods
The script allows to get a token for a (public) OAuth2 client via
the CERN SSO. The result is a JSON structure that should be further
processed with jq
or similar tools.
python3 -m daq_tokens.cern_sso --client atlas-tdaq-token --krb5 -o save
token=$(jq .access_token < save)
Now you can use the ${token}
when accessing protected URLs:
curl -H "Authorization: Bearer ${token}" https://...
When the access token is expired, refresh it:
python3 -m daq_tokens.cern_sso --client atlas-tdaq-token --refresh $(jq .refresh_token < save) -o save
Other authorization grants:
python3 -m daq_tokens.cern_sso --client atlas-tdaq-token --browser -o save
python3 -m daq_tokens.cern_sso --client atlas-tdaq-token --password -o save
A new browser window will be opened for the --browser
option, so you need a
graphical environment (or a text mode browser...).
The --password
method is strongly discouraged. Don't type your central CERN password
into random scripts that ask for it.
dbe
DVS GUI (graphical UI for DVS)
See also related DVS and TestManager packages.
dynlibs - Load shared libraries
This package is deprecated from tdaq-09-00-00 onwards.
Please use plain dlopen()
or boost::dll
instead. Note that unlike in this package, the boost::dll::shared_library
object has to stay in
scope as long as the shared library is used !
Example of possible replacement
#include <boost/dll.hpp>
double example()
{
boost::dll::shared_library lib("libmyplugin.so",
boost::dll::load_mode::type::search_system_folders |
boost::dll::load_mode::type::rtld_now);
// Get pointer to function with given signature
auto f = lib.get<double(double, double)>("my_function")
return f(10.2, 3.141);
}
emon
- Connections between event samplers and monitors have been optimized. Existing configurations should be adjusted to benefit form that. Previously the NumberOfSamplers parameter has been used to define a number of samplers to be connected by all monitors of a group that uses the same selection criteria. In the new implementation this number defines a number of samplers that each individual monitor has to connect. That makes no difference for monitors that used to connect to a single sampler and do't form a group. For the monitors that share the same selection criteria, like for example the Global Monitoring tasks, this number should be changed to the old number divided to the number of the monitors in the group. For Athena monitoring the corresponding parameter of a JO file is called KeyCount.
genconfig
tdaq-10-00-00
c++: remove generation of __get_xxx_str() attribute/relationship methods; inline static string s_xxx can be used instead
HLTSV - HLT Supervisor
tdaq-10-00-00
If at run stop no DCMs are available (e.g. if the whole HLTSV farm crashed), the HLT supervisor will now time out in the RoIBStopped transition and continue the shutdown process. The remaining events in the HLTSV memory are lost.
The timeout in seconds is specified in HLTSVApplication.StopTimeout. It is 3 minutes by default and should be adjusted to the expected time it normally takes the HLT farm to process the remaining events after the L1 trigger was stopped.
ISPY
tdaq-09-05-00
New Methods on IPCPartion
These allow to retrieve the available CORBA interfaces (types) and objects of a given type, as well as checking if a given object is valid.
p = IPCPartition("...")
p.getTypes()
['mts/worker', 'rc/commander', 'rdb/writer', 'is/repository', 'rmgr/ResMgr', 'rdb/cursor', 'pmgpriv/SERVER', 'ipc/servant']
p.getObjects('rdb/writer')
{'RDB_RW': {'valid': True, 'name': 'RDB_RW_INITIAL', 'owner': 'rhauser', 'host': 'rhauser-inspiron', 'pid': 1947, 'time': 1664623815, 'debug': 0, 'verbose': 0}}
p.isObjectValid('RDB_RW', 'rdb/writer')
True
IS Attribute format()
Method
This returns a formatter that can be called to get the desired string representation for the attribute (typically used for hex representation).
fmt = attr.format()
print(fmt.format(value))
Minimal MTS Receiver Interface
To create a MTS subcription and register a callback:
def callback(issue):
print(issue)
import mts
sub = mts.add_receiver(callback, "initial", "*")
The last argument is an MTS subscription expression. The sub
return value
should be kept alive as the subscription will be removed if it is garbage
collected. It can also be used to unsubscribe explicitly.
sub.remove_receiver()
The issue
passed to the callback is a Python dictionary with all
the content from an ers::Issue
.
jeformat
Package: jeformat
mda
Small fix in mda_register script, updates for Python3.
MTS
tdaq-10-00-00
MTS library: synchronous subscription
Added synchronous subscription functionality (https://its.cern.ch/jira/browse/ATDAQCCMRS-41): call to add_receiver
publishes subscripton information in MTS IS server and then waits untill all MTS workers (registered at that moment) explicitly notify the subscribed that subscription is registered, or untill the specified timeout expires (milliseconds).
Functionality is implemented in both Java and C++ libraries. The changes are backward-compatible: the synchronization is only enabled when add_receiver
is called with additional third parameter, the timeout value. Most of the clients create subscription well in advance before receiving first message, so this functionality is mostly addressing those who creates a subscription and may expect messages to arrive very shortly after (CES recovery case which is based on MTS messages exchange).
A reasonable timeout would be 50ms, though normally workers are notified within few milliseconds.
mts2splunk utility
Added two new command line parameters, allowing to filter out messages with particular msgId
or appId
.
OKS
tdaq-10-00-00
- set
pull.rebase=true
when clone oks repository to avoid misleading merge commits by users
owl
tdaq-10-00-00
- Remove deprecated OWLMutex and OWLMutexRW classes. Use std::mutex and std::shared_mutex instead.
- Remove deprecated OWLThread. Use std::thread instead.
- Remove deprecated OWLCondition. Use std::condition_var instead.
P-BEAST
tdaq-10-00-00
- introduce v5 data format with more efficient data compaction (use delta compaction for integer data types)
- add bstconfig (new config plugin) and use it for pbeast service operational monitoring (simplifies integration with k8s services), see ADAMATLAS-419
- the TDAQ_PBEAST_SERVER_URL and TDAQ_PBEAST_RECEIVER_URL process environment variables can be used to specify the server and web receiver URLs; the PBEAST_SERVER_BASE_URL is deprecated
rn (Run Number)
tdaq-09-01-00
Jira: ADTCC-242
Store start of run timestamp and run duration with nanoseconds resolution. Store TDAQ release name. If oks repository is used, store oks config version into run number database and tag oks repository by run number / partition name.
swrod
IS Information Update
Several new attributes have been added to the ROBStatistics IS information type: * enabled is set to false when the corresponding ROB is stoplessly removed from the ongoing run, true otherwise. * latePackets - per ROB counter of data packets that arrive to the SW ROD when the corresponding fragments have been already built due to the timeout. The same counter has been added to the LinkStatistcis IS class.
A new gcDeletedFragments attribute has been added to the HLTRequestStatistics IS class. This is a counter of ROB fragments which were removed by the garbage collector rather than by a normal Clear request.
If the DF IS server contains the IS objects published by the previous version of the SW ROD, they must be removed to release publication of the new objects. This can be done using the is_rm command, e.g.:
is_rm -p <partition name> -n DF -r "swrod.*"
Configuration schema changes
-
EnableGarbageCollection attribute has been added to the SwRodHLTRequestHandler class. If its value is set to 1 (default) then SW ROD will remove the oldest ROB fragments from the HLT buffer when the buffer gets full.
-
UnsubscribeDisabledLinks attribute has been added to the SwRodFragmentBuilder class. If its value is set to 1 (default) then SW ROD will unsubscribe from the stoplessly removed input links, otherwise just mark them as disabled and discard all data received through them.
-
MaxReorder attribute has been added to the SwRodHLTRequestHandler class. It defines the maximum size of latest L1 ID derandomising map, that is used to optimize HLT request handling.
-
DropCorruptedPackets attribute has been moved from the SwRodGBTModeBuilder class to the SwRodFragmentBuilder. This way it was made available to the SwRodFullModeBuilder as well.
-
Two attributes have been added to the SwRodDataChannel class.
- TTCControllerName - defines the name of the closest TTC segment controller. Used to detect when TTC Restart operation is ongoing. When SW ROD application is executing PrepareForRun Run Control transition it checks the state of the corresponding controller. If the controller is in the RUNNING state then the SW ROD application assumes that it is just being restarted. In another case it is assumed that the TTC Restart procedure is taking place. By default this parameter is set to the "RootController" string.
- UpdateECRCounter - If this parameter is set to true the SW ROD application sets ECR counters of the FELIX cards it is subscribed to the last known ECR value during TTC Restart, Stopless Recovery and Resynchronise procedures.
-
Three attributes have been added to the SwRodFragmentBuilder class.
- PacketDumpPath - The name of a directory to store files with dumps of corrupted packets. A corrupted packet will be dumped either if L1 ID can not be reliable extracted from it or the DropCorruptedPackets parameter is set to true.
- PacketDumpLimit - The maximum number of packets to dump per run.
- LinkLoggingLimit - The maximum number of data corruption incidents reported for a single E-Link
-
A new DataReceivingTimeout attribute has been added to the SwRodGBTModeBuilder class. It defines a timeout in milliseconds for ROB fragments building. If the given number of milliseconds is passed after receiving the first data chunk for a particular ROB fragment this fragment will be considered as built and will be passed to the fragment consumers even if it does not contain data chunks from all the E-links associated with the given ROB.
-
A new DeferProcessing attribute has been added to the SwRodCustomProcessor class. If it is set to true the processing will be applied only when serialization of the ROB fragment is requested. This happens for example when the fragment is about to be written to a file or been sent over the network. This may be used to reduce computing resources for the fragments that require heavy processing but are rarely requested by the HLT. Default value of this attribute is false.
-
A new ProfileExecution attribute has been added to the SwRodCustomProcessor class. If it is set to true the processor will keep a record of total time of the custom processing execution and will print it to the standard output when SW ROD is terminated. Default value is false.
GBT Fragment Building Timeout
This version implements fragment building timeout for GBT fragment building algorithm. One can specify the timeout value via the new attribute of the SwRodGBTModeBuilder class called DataReceivingTimeout. This attribute contains a number of milliseconds to wait after receiving the first data chunk for a particular ROB fragment to consider this fragment as built even if not all data chunks have been received. By default this attribute is set to zero, which disables the timeout.
Custom Plugin Test application
A new application that can be used for validation and profiling of a custom plugin has been added. The application is called swrod_custom_plugin_test and can be used in the following way:
* For the first time it has to be given five parameters: a name of the data file to be used as data source, a name of the shared library that implements the plugin to be tested and the names of the three custom functions which this plugin implements. Optionally one can use -o
transport
Clients using classes from the transport
package should
look into more modern network libraries like
boost::asio
until the C++ standard contains an official network library.