API Documentation

We use the Broker to pose queries for saved data sets (“runs”). A search query returns a lazy-loaded iterable of Header objects. Each Header encapsulates the metadata for one run. It provides convenient methods for exploring that metadata and loading the full data.

The Broker object

Broker

This supports the original Broker API but implemented on intake.Catalog.

Making a Broker

You can instantiate a Broker by passing it a dictionary of configuration or by providing the name of a configuration file on disk.

Broker.from_config

Broker.named

Create a new Broker instance using a configuration file with this name.

Click the links the table above for details and examples.

Searching

Broker.__call__

Call self as a function.

Broker.__getitem__

For some Broker instance named db, db() invokes Broker.__call__() and db[] invokes Broker.__getitem__(). Again, click the links the table above for details and examples.

Loading Data

These methods are an older way to access data, like this:

header = db[-1]
db.get_table(header)

The newer Header methods, documented later on this page are more convenient.

header = db[-1]
header.table()

(Notice that we only had to type db once.) However, these are still useful to loading data from multiple headers at once, which the new methods cannot do:

headers = db[-10:]  # the ten most recent runs
db.get_table(headers)

Broker.get_documents

Get all documents from one or more runs.

Broker.get_events

Get Event documents from one or more runs.

Broker.get_table

Load the data from one or more runs as a table (pandas.DataFrame).

Broker.get_images

This method is deprecated.

Broker.restream

Get all Documents from given run(s).

Broker.process

Pass all the documents from one or more runs into a callback.

The broker also has a number of methods to introspect headers:

Broker.get_fields

Return the set of all field names (a.k.a “data keys”) in a header.

Broker.stream_names_given_header

Saving Data

Broker.insert

Configuring Filters and Aliases

Broker.add_filter

Add query to the list of ‘filter’ queries.

Broker.clear_filters

Clear all ‘filter’ queries.

Broker.alias

Create an alias for a query.

Broker.dynamic_alias

Create an alias for a “dynamic” query, a function that returns a query.

This current list of filters and aliases is accessible via the attributes Broker.filters and Broker.aliases respectively. Again, click the links in the table for examples.

Export Data to Another Broker

Broker.export

Serialize a list of runs.

Broker.export_size

Get the size of files associated with a list of headers.

Advanced: Controlling the Return Type

The attribute Broker.prepare_hook is a function with the signature f(name, doc) that is applied to every document on its way out.

By default Broker.prepare_hook is set to wrap_in_deprecated_doct(). The resultant objects issue warnings if users attempt to access items with dot access like event.data instead of dict-style lookup like event['data']. To restore the previous behavior (i.e. suppress the warnings on dot access), set Broker.prepare_hook to wrap_in_doct().

To obtain plain dictionaries, set Broker.prepare_hook to lambda name, doc: copy.deepcopy(doc). (Copying of is recommended because the underlying objects are cached and mutable.)

In a future release of databroker, the default return type may be changed to plain dictionaries for simplicity and improved performance.

wrap_in_deprecated_doct

Put document contents into a DeprecatedDoct object.

wrap_in_doct

Put document contents into a doct.Document object.

The Header object

Header

This supports the original Header API but implemented on intake’s Entry.

Metadata

The Header bundles together the metadata of a run, accessible via the attributes corresponding to the underlying documents:

Header.uid

Header.start

Header.stop

Header.descriptors

Measurements are organized into “streams” of asynchronously collected data. The names of all the streams are listed in the attribute Header.stream_names.

Note

It helps to understand how data and metadata are organized in our document model. This is covered well in this section of the bluesky documentation.

The information in these documents is a lot to navigate. Convenience methods make it easier to extract some key information:

Header.fields

Return the names of the fields (‘data keys’) in this run.

Header.devices

Return the names of the devices in this run.

Header.config_data

Extract device configuration data from Event Descriptors.

Header.stream_names

Data

The Header provides various methods for loading the ‘Event’ documents, which may be large. They all access the same data, presented in various ways for convenience.

Header.table

Load the data from one event stream as a table (pandas.DataFrame).

Header.data

Extract data for one field.

Header.documents

Load all documents from the run.

Header.events

Load all Event documents from one event stream.

Header.xarray

Header.xarray_dask

All of the above accept an argument called stream_name, which distinguishes concurrently-collected stream of data. (Typical names include ‘primary’ and ‘baseline’.) A list of all a header’s stream names is accessible via the attribute Header.stream_names, a list.

To request data from all event streams at once, use the special constant databroker.ALL.

Configuration Utilities

list_configs

List the names of the available configuration files.

lookup_config

Search for a databroker configuration file with a given name.

temp

Broker.name

Broker.get_config

Return the v0 config dict this was created from, or None if N/A.

Broker.stats

Access MongoDB storage statistics for this database.

Back- and Forward-Compat Accessors

Broker.v1

A self-reference.

Broker.v2

Accessor to the version 2 API.

Internals

Broker.reg

Registry of externally-stored data

Broker.fetch_external

Deprecated

Broker.stream

Get all Documents from given run(s).

Header.stream

Broker.fs

Header.get

Header.items

Header.keys

Header.values

Removed

These functions and methods now raise NotImplementedError if called.

Broker.fill_event

Broker.fill_events

temp_config