Fetching Data

Note

It helps to understand how data and metadata are organized in our document model. This is covered well in this section of the bluesky documentation. This background is not essential, but we recommend it for more context.

Broker.get_table(headers, stream_name='primary', fields=None, fill=False, handler_registry=None, convert_times=True, timezone=None, localize_times=True)

Load the data from one or more runs as a table (pandas.DataFrame).

Parameters:

headers : Header or iterable of Headers

The headers to fetch the events for

stream_name : str, optional

Get events from only “event stream” with this name.

Default is ‘primary’

fields : List[str], optional

whitelist of field names of interest; if None, all are returned

Default is None

fill : bool or Iterable[str], optional

Which fields to fill. If True, fill all possible fields.

Each event will have the data filled for the intersection of it’s external keys and the fields requested filled.

Default is False

handler_registry : dict, optional

mapping filestore specs (strings) to handlers (callable classes)

convert_times : bool, optional

Whether to convert times from float (seconds since 1970) to numpy datetime64, using pandas. True by default.

timezone : str, optional

e.g., ‘US/Eastern’; if None, use metadatastore configuration in self.mds.config[‘timezone’]

handler_registry : dict, optional

mapping asset specs (strings) to handlers (callable classes)

localize_times : bool, optional

If the times should be localized to the ‘local’ time zone. If True (the default) the time stamps are converted to the localtime zone (as configure in mds).

This is problematic for several reasons:

  • apparent gaps or duplicate times around DST transitions
  • incompatibility with every other time stamp (which is in UTC)

however, this makes the dataframe repr look nicer

This implies convert_times.

Defaults to True to preserve back-compatibility.

Returns:

table : pandas.DataFrame

Broker.get_images(headers, name, stream_name='primary', handler_registry=None)

This method is deprecated. Use Broker.get_documents instead.

Load image data from one or more runs into a lazy array-like object.

Parameters:

headers : Header or list of Headers

name : string

field name (data key) of a detector

handler_registry : dict, optional

mapping spec names (strings) to handlers (callable classes)

Examples

>>> header = DataBroker[-1]
>>> images = Images(header, 'my_detector_lightfield')
>>> for image in images:
        # do something
Broker.get_events(headers, stream_name='primary', fields=None, fill=False, handler_registry=None)

Get Event documents from one or more runs.

Parameters:

headers : Header or iterable of Headers

The headers to fetch the events for

stream_name : str, optional

Get events from only “event stream” with this name.

Default is ‘primary’

fields : List[str], optional

whitelist of field names of interest; if None, all are returned

Default is None

fill : bool or Iterable[str], optional

Which fields to fill. If True, fill all possible fields.

Each event will have the data filled for the intersection of it’s external keys and the fields requested filled.

Default is False

handler_registry : dict, optional

mapping asset specs (strings) to handlers (callable classes)

Yields:

event : Event

The event, optionally with non-scalar data filled in

Raises:

ValueError if any key in `fields` is not in at least one descriptor

pre header.

Broker.restream(headers, fields=None, fill=False)

Get all Documents from given run(s).

Parameters:

headers : Header or iterable of Headers

header or headers to fetch the documents for

fields : list, optional

whitelist of field names of interest; if None, all are returned

fill : bool, optional

Whether externally-stored data should be filled in. Defaults to False.

Yields:

name, doc : tuple

string name of the Document type and the Document itself. Example: (‘start’, {‘time’: …, …})

See also

Broker.process()

Examples

>>> def f(name, doc):
...     # do something
...
>>> h = DataBroker[-1]  # most recent header
>>> for name, doc in restream(h):
...     f(name, doc)
Broker.process(headers, func, fields=None, fill=False)

Pass all the documents from one or more runs into a callback.

Parameters:

headers : Header or iterable of Headers

header or headers to process documents from

func : callable

function with the signature f(name, doc) where name is a string and doc is a dict

fields : list, optional

whitelist of field names of interest; if None, all are returned

fill : bool, optional

Whether externally-stored data should be filled in. Defaults to False.

Examples

>>> def f(name, doc):
...     # do something
...
>>> h = DataBroker[-1]  # most recent header
>>> process(h, f)
databroker.broker.get_fields(header, name=None)[source]

Return the set of all field names (a.k.a “data keys”) in a header.

Parameters:

header : Header

name : string, optional

Get field from only one “event stream” with this name. If None (default) get fields from all event streams.

Returns:

fields : set