Recording Metadata¶
Usage Example¶
Metadata is bundled with the data generated by a scan. It can be used by callbacks (e.g., to tell plots how to label their axes) or simply used to identify and analyze the data later.
In [1]: RE.md['project'] = 'my xray project'
In [2]: RE.md['sample'] = {'color': 'red', 'dimensions': [10, 20, 5]}
Or specified when the scan is run.
In [3]: plan = Count([det])
In [4]: RE(plan, experimenter='Emily', mood='excited')
+-----------+------------+------------+------------+------------+
| seq_num | time | det1 | det2 | det3 |
+-----------+------------+------------+------------+------------+
| 1 | 21:16:10.4 | | | |
+-----------+------------+------------+------------+------------+
Count count ['4cf8aa'] (scan num: 4)
Out[4]: ['4cf8aacc-4fb1-4f13-980e-8330440b4281']
Special Fields¶
Custom metadata keywords can be mapped to strings (task='calibration'
),
numbers (attempt=5
), lists (dimensions=[1, 3]
), or
dictionaries (dimensions={'width': 1, 'height': 3}
). But certain keywords
are given special significance by bluesky’s document model.
String Fields¶
To facilitate searchability, the keywords ‘owner’, ‘group’, and ‘project’ are
given special significance. They are all optional, but if provided they must be
strings like owner='Dan'
. A non-string, like owner=5
will produce an
error that will interrupt scan execution immediately after it starts.
Again, these fields are optional.
Sample¶
Similarly, the keyword “sample” has special significance. It must be either a string:
'red 10 20 5'
or a dictionary:
{'color': 'red', 'dimensions': [10, 20, 5]}
A dictionary is preferred because it is self-describing and more richly searchable, but either is allowed.
Scan ID¶
The scan_id
field is expected to be an integer, and it is automatically
incremented between runs. If a scan_id
is not provided by the user or
stashed in the persistent metadata from the previous run, it defaults to 1.
Required Fields¶
In current versions of bluesky, no fields are required.
In versions v0.4.3 and below, the keys owner
, group
, and
beamline_id
were required.
Persistence Between Runs¶
To set a field of metadata to persist for future runs, add it to RE.md
.
In [5]: RE.md['color'] = 'blue'
Now it will be included in the metadata of every scan until it is deleted:
In [6]: del RE.md['color']
To review the metadata before running ascan, check RE.md
, which
behaves like a Python dictionary.
In [7]: RE.md['sample']
Out[7]: {'color': 'red', 'dimensions': [10, 20, 5]}
To start fresh:
In [8]: RE.md.clear()
Persistence Between Sessions¶
The RE.md
attribute shown above may be a Python dictionary or anything
that supports the dictionary interface. To persist metadata between
sessions, we suggest historydict
— a Python dictionary backed by a
sqlite database.
Example:
In [9]: from historydict import HistoryDict
In [10]: hist = HistoryDict('metadata-cache.sqlite')
In [11]: RE = RunEngine(hist)
In [12]: type(RE.md)
Out[12]: historydict.HistoryDict
Any metadata added to RE.md
, including the scan_id
, will be saved
and can be re-loaded.
Metadata Validator¶
Additional, customized metadata validation can be added to the RunEngine. For example, to ensure that a run will not be executed unless the parameter ‘sample_number’ is specified, define a function that accepts a dictionary argument and raises if ‘sample_number’ is not found.
def ensure_sample_number(md):
if 'sample_number' not in md:
raise ValueError("You forgot the sample number.")
Apply this function by setting RE.md_validator = ensure_sample_number
.
The function will be executed immediately before each new run in opened.