Retrieve metadata, tabular data, and image data¶
Problem¶
Retrieve metadata, tabular data, or image data for analysis, processing, or export.
Approach¶
Use the databroker, the all-in-one interface to saved data.
Retrieve the metadata for the run(s) of interest. Retrieve the data itself in three different modes:
- a general-purpose method, which provides maximum flexiblity and performance
- a convenient method for retrieving tabular data
- a convenient method for retrieving image data
Example Solution¶
The first step is always retrieving the metadata; from there, we can retrieve the data itself.
We’ll preface this example by running a scan to generate some example data.
In [1]: uid, = RE(scan([det], motor, -10, 10, 15))
The unique id of the data set has been stashed in the variable uid
. We can
use that to retrieve the data from the databroker.
In [2]: h = db[uid]
NameErrorTraceback (most recent call last)
<ipython-input-2-107caca0d61b> in <module>()
----> 1 h = db[uid]
NameError: name 'db' is not defined
What we get back is a header, which contains all of the metadata from the run. For example, we can review the names of the detector(s) involved:
In [3]: h['start']['detectors']
NameErrorTraceback (most recent call last)
<ipython-input-3-dbb72a980c05> in <module>()
----> 1 h['start']['detectors']
NameError: name 'h' is not defined
There is a lot of information in h
. See How metadata is organized: understand the contents of the header.
If we don’t know the uid, we can search for the metadata in other ways. One
of the most common is recency: db[-1]
retrieves the header of the most
recent scan; db[-5]
means “five scans ago”; db[-5:]
retrieve all
of the last five scans together. See
this section of the databroker documentation
for more.
Now, what about the data itself?
General-Purpose Method¶
In [4]: events = db.get_events(h)
NameErrorTraceback (most recent call last)
<ipython-input-4-d5be852c102a> in <module>()
----> 1 events = db.get_events(h)
NameError: name 'db' is not defined
In the variable events
, we now have a collection of documents
(dictionary-like mappings of names to values). Each event corresponds to
a single data point, a row in table.
For performance reasons, the data has not actually been loaded yet. The data
is loaded one point at a time if we loop through events
. (This is very
useful for applications where we don’t need to load the entire data set.)
To load the entire data set once, convert events
to a list.
In [5]: events = list(events) # for large data sets, this takes awhile
NameErrorTraceback (most recent call last)
<ipython-input-5-9138b19c5e59> in <module>()
----> 1 events = list(events) # for large data sets, this takes awhile
NameError: name 'events' is not defined
Let’s look at all the data in the events.
In [6]: [event['data'] for event in events]
NameErrorTraceback (most recent call last)
<ipython-input-6-131a6a035bf3> in <module>()
----> 1 [event['data'] for event in events]
NameError: name 'events' is not defined
You might be thinking, “Just give me data!” As promised, the general-purpose method is flexible, but it lacks terseness. For more direct methods, read on!
To learn more about the structure of an event
, refer to the
overview of the document model.
Retrieving a Table¶
In [7]: db.get_table(h)
NameErrorTraceback (most recent call last)
<ipython-input-7-1e31461aefa6> in <module>()
----> 1 db.get_table(h)
NameError: name 'db' is not defined
The result is a DataFrame. One can access individual columns like so:
In [8]: table = db.get_table(h)
NameErrorTraceback (most recent call last)
<ipython-input-8-59238a931038> in <module>()
----> 1 table = db.get_table(h)
NameError: name 'db' is not defined
In [9]: table['det']