Retrieve metadata, tabular data, and image data¶
Problem¶
Retrieve metadata, tabular data, or image data for analysis, processing, or export.
Approach¶
Use the databroker, the all-in-one interface to saved data.
Retrieve the metadata for the run(s) of interest. Retrieve the data itself in three different modes:
- a general-purpose method, which provides maximum flexiblity and performance
- a convenient method for retrieving tabular data
- a convenient method for retrieving image data
Example Solution¶
The first step is always retrieving the metadata; from there, we can retrieve the data itself.
We’ll preface this example by running a scan to generate some example data.
In [1]: uid, = RE(scan([det], motor, -10, 10, 15))
The unique id of the data set has been stashed in the variable uid. We can
use that to retrieve the data from the databroker.
In [2]: h = db[uid]
NameErrorTraceback (most recent call last)
<ipython-input-2-107caca0d61b> in <module>()
----> 1 h = db[uid]
NameError: name 'db' is not defined
What we get back is a header, which contains all of the metadata from the run. For example, we can review the names of the detector(s) involved:
In [3]: h['start']['detectors']
NameErrorTraceback (most recent call last)
<ipython-input-3-dbb72a980c05> in <module>()
----> 1 h['start']['detectors']
NameError: name 'h' is not defined
There is a lot of information in h. See How metadata is organized: understand the contents of the header.
If we don’t know the uid, we can search for the metadata in other ways. One
of the most common is recency: db[-1] retrieves the header of the most
recent scan; db[-5] means “five scans ago”; db[-5:] retrieve all
of the last five scans together. See
this section of the databroker documentation
for more.
Now, what about the data itself?
General-Purpose Method¶
In [4]: events = db.get_events(h)
NameErrorTraceback (most recent call last)
<ipython-input-4-d5be852c102a> in <module>()
----> 1 events = db.get_events(h)
NameError: name 'db' is not defined
In the variable events, we now have a collection of documents
(dictionary-like mappings of names to values). Each event corresponds to
a single data point, a row in table.
For performance reasons, the data has not actually been loaded yet. The data
is loaded one point at a time if we loop through events. (This is very
useful for applications where we don’t need to load the entire data set.)
To load the entire data set once, convert events to a list.
In [5]: events = list(events) # for large data sets, this takes awhile
NameErrorTraceback (most recent call last)
<ipython-input-5-9138b19c5e59> in <module>()
----> 1 events = list(events) # for large data sets, this takes awhile
NameError: name 'events' is not defined
Let’s look at all the data in the events.
In [6]: [event['data'] for event in events]
NameErrorTraceback (most recent call last)
<ipython-input-6-131a6a035bf3> in <module>()
----> 1 [event['data'] for event in events]
NameError: name 'events' is not defined
You might be thinking, “Just give me data!” As promised, the general-purpose method is flexible, but it lacks terseness. For more direct methods, read on!
To learn more about the structure of an event, refer to the
overview of the document model.
Retrieving a Table¶
In [7]: db.get_table(h)
NameErrorTraceback (most recent call last)
<ipython-input-7-1e31461aefa6> in <module>()
----> 1 db.get_table(h)
NameError: name 'db' is not defined
The result is a DataFrame. One can access individual columns like so:
In [8]: table = db.get_table(h)
NameErrorTraceback (most recent call last)
<ipython-input-8-59238a931038> in <module>()
----> 1 table = db.get_table(h)
NameError: name 'db' is not defined
In [9]: table['det']