Configuration and processing¤

The context file¤

The backend runs a context file like this one, which contains functions that it will execute:

context.py
import numpy as np
from damnit_ctx import Variable

from extra.components import XGM, XrayPulses

@Variable(title="XGM intensity [uJ]", summary="mean")
def xgm_intensity(run):
    """
    Mean XGM intensity per-train.
    """
    return XGM(run).pulse_energy().mean("pulseIndex")

@Variable(title="Pulses", summary="mean")
def pulses(run):
    """
    Number of pulses in the run.
    """
    return XrayPulses(run).pulse_counts().to_xarray()

By convention it's stored under the usr/Shared/amore directory of a proposal, along with other files that DAMNIT creates like the SQLite database and the HDF5 files that are created for each run.

`@Variable`'s¤

Functions in the context file can be decorated with @Variable to denote that these are variables to be executed for each run. The @Variable decorator takes these arguments:

title (string): title displayed for the variable's column.
tags (string or list of strings): tags to categorize and filter variables. Tags can be used to group related variables and filter the table view:
```
@Variable(title="AGIPD data", tags=["detector", "agipd", "raw"])
def agipd_data(run):
    ...
```
summary (string): if the function returns an array, then summary will be used to reduce it to a single number. Internally it gets mapped to np.<summary>(), so you can use e.g. sum or nanmean to compute the summary with np.sum() or np.nanmean() respectively.
data (string): this sets the trigger for the variable. By default Variable's have data="raw", which means they will be triggered by a migration of raw data to the offline cluster. But if you want to process detector data which requires calibration, then you'll want to set data="proc" to tell DAMNIT to run that Variable when the calibration pipeline finishes processing the run:
```
@Variable(title="Detector preview", data="proc")
def detector_preview(run):
    ...
```
cluster (bool): whether or not to execute this variable in a Slurm job. This should always be used if the variable does any heavy processing.
transient (bool): do not save the variable's result to the database. This is useful for e.g. intermediate results to be reused by other Variables. Since their data isn't saved, transient variables can return any object. By default Variables do save their results (transient=False).

Variable functions can return any of:

Scalars
Lists of scalars
Multi-dimensional numpy.ndarray's or xarray.DataArray's (2D arrays will be treated as images)
xarray.Dataset's
Matplotlib Figures or Axes (will be saved as 2D images).
Plotly figures (will be saved as JSON so that the GUI can display them in an interactive plot).
Strings
None

The functions must always take in one argument, run, to which is passed a DataCollection of the data in the run. In addition, a function can take some other special arguments if they have the right annotations, currently.
meta accesses internal arguments:

meta#run_number: The number of the current run being processed.
meta#proposal: The number of the current proposal.
meta#proposal_path: The root Path to the current proposal.

mymdc requests information from the EuXFEL data management portal MyMDC:

mymdc#run_type: The run type from myMdC.
mymdc#sample_name: The sample name from myMdC.
mymdc#techniques: list of technique associated with the run. Each technique listed is a dict containing the following keys: description, flg_available, id, identifier, name, runs_techniques_id, url.

You can also use annotations to express a dependency between Variable's using the var#<name> annotation:

@Variable(title="foo")
def foo(run, run_no: "meta#run_number"):
    # Just return the run number
    return run_no

@Variable(title="bar")
def bar(run, value: "var#foo"):
    # Now bar() will be executed after foo(), and we can use its return value
    return value * 2

Pattern matching using glob patterns is allowed in var# annotations. In such case, a dict is returned where keys are the name of matching dependencies and values are their results:

@Variable()
def base_1(run):
    return 1

@Variable()
def base_2(run):
    return 2

@Variable()
def sum_values(run, data: 'var#base_*'):
    # data is a dict
    return sum(data.values())

Dependents are not executed if a variable raises an error or returns None. You can raise Skip to provide a reason, which will be visible as a tooltip on the table cell in the GUI:

from damnit_ctx import Variable, Skip

@Variable()
def binned_by_scan_step(run):
    scan = Scantool(run)
    if not scan.active:
        raise Skip("Run is not a scan")
    ...

Dependencies with default values are also allowed, the default value will be passed to the function if the dependency did not complete execution for some reason:

@Variable(title="baz")
def baz(run, value: "var#foo"=42):
    # This will return the result of foo() if foo() succeeded, otherwise 42
    return value

Variable functions can use up to 4 CPU cores and 25 GB of RAM by default. If more resources are needed, use cluster=True (see the Using Slurm section) to access all of the cores & memory of an assigned cluster node. If required, you can also change the limits for non-cluster variables:

# Allow 8 CPU cores
$ damnit db-config noncluster_cpus 8

# Allow 50 GB memory
$ damnit db-config noncluster_mem 50G

`@Group`¤

For more complex or reusable sets of analyses, you can group related variables together using a class decorated with @Group. This allows you to create self-contained, configurable components that can be instantiated multiple times.

A Group is a standard Python class containing methods decorated with @Variable. The class itself is decorated with @Group, which transforms it into a configurable, dataclass object.

context.py

from extra.components import XGM
from damnit_ctx import Variable, Group

@Group(title="XGM Diag", tags=["XGM"])
class XGMDiagnostics:
    # parameters are defined as dataclass fields
    device_name: str
    offset: float = 0.0

    @Variable(title="Pulse Energy", summary="mean")
    def pulse_energy(self, run):
        # Use instance attributes for configuration
        return XGM(run, self.device_name).pulse_energy()

    @Variable(title="Corrected Energy", summary="mean")
    def corrected_energy(self, run, energy: "self#pulse_energy"):
        # This has an intra-group dependency on the 'pulse_energy' variable
        return energy + self.offset

# Instantiate the group in your context file, providing parameter values
xgm_sa2 = XGMDiagnostics(
    name="xgm_sa2",
    title="XGM SA2",
    device_name="SA2_XTD6_XGM/XGM/DOOCS",
    offset=1.1,
)
xgm_hed = XGMDiagnostics(
    name="xgm_hed",
    title="XGM HED",
    device_name="HED_XTD9_XGM/XGM/DOOCS",
    offset=0.9,
)

Naming and Titles¤

Each Group instance has a name attribute that becomes the prefix for every Variable inside it. Pass a unique name= when you instantiate the group (or let it default to the class name). Sn error is raised if two groups end up with the same name.

The Variable name is formed by joining the Group's name and the method's name with a dot: xgm_sa2.pulse_energy.
The variable title (for display in the GUI) is formed by joining the Group's title and the Variable's title with a separator (default is /): XGM SA2/Pulse Energy.

`Group` attributes:¤

@Group injects a few dataclass fields that automatically apply to the group's Variables:

name: Machine-readable identifier and prefix. Must be unique; defaults to the class name if not provided.
title: Prefixes every Variable title within the Group.
sep (default /): Separator between the group title and the variable title.
tags: Added to every Variable in the group and merged with per-variable tags.
cluster, data and transient: These properties remain per-Variable and cannot be configured on the group itself.

Decorator defaults can be overridden per instance:

another_xgm = XGMDiagnostics(name="another_xgm", title="Another XGM", tags=["XGM!"])

Dependencies¤

Intra-group dependencies: To depend on another variable within the same Group instance, you must replace the var# prefix with self# in the attribute annotation. This explicitly tells DAMNIT to look for the Variable within the current Group's scope.
```
@Variable()
def corrected_energy(self, run, energy: "self#pulse_energy"):
    ...
```

Global and Cross-Group Dependencies: To depend on any variable outside the current group's scope, you use the standard var# prefix with the variable's final, fully-qualified name.

@Variable(title="Global Offset")
def global_offset(run):
    return 42

@Group
class MyGroup:

    @Variable()
    def local_var(self, run, offset: "var#global_offset"):
        # Correctly depends on the top-level global_offset
        return 10 + offset

    @Variable()
    def another_var(self, run, xgm_energy: "var#xgm_hed.corrected_energy"):
        # Correctly depends on a variable from another group instance
        return xgm_energy * 2

instance = MyGroup(name="analysis", title="Group")

Linking Groups¤

You can create more complex analysis pipelines by linking independent Groups. This is useful for:

Avoiding Duplication: Link to a shared component (like an XGM diagnostic) from multiple other groups. The XGM analysis will run only once, and all dependent groups will use its result.
Logical Separation: Keep different analysis domains separate. For example, detector diagnostics and beamline diagnostics can be defined in independent groups and then linked together by a higher-level analysis, without mixing their internal logic.
No Hardcoding: Avoid hardcoding variable names like var#xgm_sa2.intensity. By linking, you make your group configurable, allowing it to be connected to xgm_sa2 in one context and xgm_hed in another by passing different group instances.

Declare dataclass fields typed as other Groups and assign concrete instances when you instantiate the outer group. A self# dependency such as self#xgm.corrected_energy follows the attribute path, resolves the nested instance, and uses that instance's name (e.g. xgm_sa2) to build the final dependency (xgm_sa2.corrected_energy).

@Group(title="MID Diagnostics", tags=["MID", "Diag"])
class MIDDiagnostics:
    xgm: XGMDiagnostics
    detector: Detector

    @Variable(title="Photons per µJ")
    def photons_per_microjoule(self, run,
                               photons: "self#detector.n_photons",
                               energy: "self#xgm.corrected_energy"):
        # The system resolves `xgm` to the linked instance named "xgm_sa2"
        # and looks up the final variable `xgm_sa2.corrected_energy`.
        return photons / energy

# 1. Define the shared, top-level instances. Their `name` values
#    ("xgm_sa2", "xgm_hed") are their public identifiers.
xgm_sa2 = XGMDiagnostics(name="xgm_sa2", device_name="SA2_XTD6_XGM/XGM/DOOCS")
xgm_hed = XGMDiagnostics(name="xgm_hed", device_name="HED_XTD9_XGM/XGM/DOOCS")

agipd = Detector(name="agipd", title="AGIPD")

# 2. Instantiate the linking group and provide the dependency objects.
diag1 = MIDDiagnostics(name="mid_diag_sa2", xgm=xgm_sa2, detector=agipd)
diag2 = MIDDiagnostics(name="mid_diag_hed", xgm=xgm_hed, detector=agipd)

# Result: `diag1` depends on `xgm_sa2.corrected_energy`, and
# `diag2` depends on `xgm_hed.corrected_energy`. No work is duplicated.

Optional components and defaults¤

Sometimes a group depends on sub-components that are not always present. Declare those fields with a default of None and annotate dependencies with self#. If the referenced attribute is None and the variable argument has no default, DAMNIT removes that variable during instantiation so it never runs with missing inputs (any later attribute access raises an AttributeError).

Provide a default argument to keep the variable around and fall back to that value when the dependency is absent:

@Group
class A:
    @Variable
    def var(self, run):
        return 41

@Group
class B:
    upstream: A | None = None

    @Variable
    def needs_upstream(self, run, value: "self#upstream.var"):
        return value + 1

    @Variable
    def optional_upstream(self, run, value: "self#upstream.var" = 42):
        return value + 1

a = A(name="a")
b_full = B(name="b_full", upstream=a)
b_partial = B(name="b_partial")  # upstream defaults to None
# b_full exposes both variables. b_partial drops needs_upstream but keeps
# optional_upstream, which receives the default value (42) at runtime.

Inheritance¤

Group supports standard Python class inheritance. A decorated class can inherit from another decorated class and will automatically include all @Variable methods from its parent(s), allowing you to create common, reusable sets of analyses.

@Group(title='Base')
class BaseAnalysis:
    @Variable(title="Train Count")
    def n_trains(self, run):
        return len(run.train_ids)

# inherits base class' Group properties (e.g. title='Base')
class DetectorAnalysis(BaseAnalysis):  # Inherits n_trains
    @Variable(title="Photon Count", data="proc")
    def photon_count(self, run, n_trains: "self#n_trains"):
        # Depends on an inherited variable
        return 1e6 / n_trains

# Sub class decorate with @Group resets Group's properties
@Group(tags=["Alt"])
class DetectorAnalysisAlt(BaseAnalysis):
    ...

# This instance will have two variables: detector.n_trains and detector.photon_count
detector = DetectorAnalysis(name="detector", title="Detector")

Cell¤

The Cell object is a versatile container that allows customizing how data is stored and displayed in the table. When writing Variables, you can return a Cell object to control both the full data storage and its summary representation. A Cell takes these arguments:

data: The main data to store
summary: Function name (as string) from the numpy module to compute summary from data (e.g., 'mean', 'std')
summary_value: Direct value to use as summary (number or string)
bold: A boolean indicating whether the text should be rendered in a bold font in the table's cell
background: Cell background color as hex string (e.g. '#ffcc00') or RGB sequence (0-255 values)
preview: What to show in a pop-up when the cell is double clicked. This can be a 1D or 2D array, or a Matplotlib or Plotly figure. If data is one of these types, it doesn't need to be specified again.

Example Usage:

@Variable(title="Peaks")
def peaks(run):
    success, counts, data = computation(run)
    return Cell(
        data=data,
        summary_value=f"{counts} peaks detected" if success else "No peak",
        bold=True,
        background="#7cfc00" if success else "#ff0000"
    )

Using Slurm¤

As mentioned in the previous section, variables can be marked for execution in a Slurm job with the cluster=True argument to the decorator:

@Variable(title="Foo", cluster=True)
def foo(run):
    # some heavy computation ...
    return 42

This should work out-of-the-box with no other configuration needed. By default DAMNIT will figure out an appropriate partition that user has access to, but that can be overridden by explicitly setting a partition or reservation:

# Set a reservation
$ damnit db-config slurm_reservation upex_001234

# Set a partition
$ damnit db-config slurm_partition allgpu

If both slurm_reservation and slurm_partition are set, the reservation will be chosen. The jobs will be named something like r42-p1234-damnit and both stdout and stderr will be written to the run's log file in the process_logs/ directory.

Warning

Make sure to delete the reservation setting after the reservation has expired, otherwise Slurm jobs will fail to launch.

$ damnit db-config slurm_reservation --delete

Using custom environments¤

DAMNIT supports running the context file in a user-defined Python environment, which is handy if there's a certain package you want that's only installed in a certain environment. At some point the setting for this will be exposed in the GUI, but right now you'll have to change it on the command line by passing the path to the python executable of the required environment:

$ damnit db-config context_python /path/to/your/python

The environment must have these dependencies installed for DAMNIT to work:

extra_data
extra_proposal

If your variables return plotly plots, the environment must also have the kaleido package.

Starting from scratch¤

Sometimes it's useful to delete all of the data so far and start from scratch. As long as you have the context file this is safe, with the caveat that comments and user-editable variables cannot be restored.

The steps to delete all existing data are:

rm runs.sqlite to delete the database used by the GUI.
rm -rf extracted_data/ to delete the HDF5 files created by the backend.
damnit proposal 1234 to create a blank database for the given proposal.

And then you can reprocess runs with damnit reprocess to restore their data.