How Data and Metadata are Organized¶
Documents¶
Data and metadata are bundled into what we dub documents, Python dictionaries organized in a formally specified way. For every “run” — loosely speaking, a dataset — there are four types of document.
A Run Start document, containing all of the metadata known at the start. Highlights:
- time — the start time
- plan_name — e.g.,
'scan'
or'count'
- uid — randomly-generated ID that uniquely identifies this run
- scan_id — human-friendly integer scan ID (not necessarily unique)
- any other metadata captured execution time from the plan or the user
Event documents, containing the actual measurements. Highlights:
- time — a timestamp for this group of readings
- data — a dictionary of readings like
{'temperature': 5.0, 'position': 3.0}
- timestamps — a dictionary of individual timestamps for each reading, from the hardware
Event Descriptor documents, with metadata about the measurements in the events (units, precision, etc.) and about the configuration of the hardware that generated them.
A Run Stop document, containing metadata known only at the end. Highlights:
- time — the time when the run was completed
- exit_status — “success”, “abort”, or “fail”
We refer you this section of the bluesky documentation for more details and context.
Headers¶
The result of a search is a header, which bundles together the metadata-related documents:
- header.start — the “Run Start” document
- header.descriptors — the “Event Descriptor” documents
- header.stop – the “Run Stop” document
The only documents omitted from header
are the events, which contain
(most of) the actual measured data. That may take more time to load, so we load
it in a separate step. See Fetching Data.
Some useful examples:
# When did this run start and end?
header.start.time
header.stop.time
# What kind of experimental procedure ("plan") was this?
header.start.plan_name # e.g., 'scan', 'relative_scan', etc.
# Did it finish successfully?
header.stop.exit_status # 'success', 'fail', or 'abort'
In later, more specific examples, we’ll see more specific and useful metadata.
Note
Fields in a header can be accessed in two ways. These are equivalent:
header['start']['time']
header.start.time