Recording Metadata

Usage Example

Metadata is bundled with the data generated by a scan. It can be used by callbacks (e.g., to tell plots how to label their axes) or simply used to identify and analyze the data later.

In [1]: RE.md['project'] = 'my xray project'

In [2]: RE.md['sample'] = {'color': 'red', 'dimensions': [10, 20, 5]}

Or specified when the scan is run.

In [3]: plan = Count([det])

In [4]: RE(plan, experimenter='Emily', mood='excited')
+-----------+------------+------------+------------+------------+
|   seq_num |       time |       det1 |       det2 |       det3 |
+-----------+------------+------------+------------+------------+
|         1 | 21:16:10.4 |            |            |            |
+-----------+------------+------------+------------+------------+
Count count ['4cf8aa'] (scan num: 4)
Out[4]: ['4cf8aacc-4fb1-4f13-980e-8330440b4281']

Special Fields

Custom metadata keywords can be mapped to strings (task='calibration'), numbers (attempt=5), lists (dimensions=[1, 3]), or dictionaries (dimensions={'width': 1, 'height': 3}). But certain keywords are given special significance by bluesky’s document model.

String Fields

To facilitate searchability, the keywords ‘owner’, ‘group’, and ‘project’ are given special significance. They are all optional, but if provided they must be strings like owner='Dan'. A non-string, like owner=5 will produce an error that will interrupt scan execution immediately after it starts.

Again, these fields are optional.

Sample

Similarly, the keyword “sample” has special significance. It must be either a string:

'red 10 20 5'

or a dictionary:

{'color': 'red', 'dimensions': [10, 20, 5]}

A dictionary is preferred because it is self-describing and more richly searchable, but either is allowed.

Scan ID

The scan_id field is expected to be an integer, and it is automatically incremented between runs. If a scan_id is not provided by the user or stashed in the persistent metadata from the previous run, it defaults to 1.

Required Fields

In current versions of bluesky, no fields are required.

In versions v0.4.3 and below, the keys owner, group, and beamline_id were required.

Persistence Between Runs

To set a field of metadata to persist for future runs, add it to RE.md.

In [5]: RE.md['color'] = 'blue'

Now it will be included in the metadata of every scan until it is deleted:

In [6]: del RE.md['color']

To review the metadata before running ascan, check RE.md, which behaves like a Python dictionary.

In [7]: RE.md['sample']
Out[7]: {'color': 'red', 'dimensions': [10, 20, 5]}

To start fresh:

In [8]: RE.md.clear()

Persistence Between Sessions

The RE.md attribute shown above may be a Python dictionary or anything that supports the dictionary interface. To persist metadata between sessions, we suggest historydict — a Python dictionary backed by a sqlite database.

Example:

In [9]: from historydict import HistoryDict

In [10]: hist = HistoryDict('metadata-cache.sqlite')

In [11]: RE = RunEngine(hist)

In [12]: type(RE.md)
Out[12]: historydict.HistoryDict

Any metadata added to RE.md, including the scan_id, will be saved and can be re-loaded.

Metadata Validator

Additional, customized metadata validation can be added to the RunEngine. For example, to ensure that a run will not be executed unless the parameter ‘sample_number’ is specified, define a function that accepts a dictionary argument and raises if ‘sample_number’ is not found.

def ensure_sample_number(md):
    if 'sample_number' not in md:
        raise ValueError("You forgot the sample number.")

Apply this function by setting RE.md_validator = ensure_sample_number. The function will be executed immediately before each new run in opened.