Fetching Data¶
Note
It helps to understand how data and metadata are organized in our document model. This is covered well in this section of the bluesky documentation. This background is not essential, but we recommend it for more context.
-
Broker.
get_table
(headers, stream_name='primary', fields=None, fill=False, handler_registry=None, convert_times=True, timezone=None, localize_times=True)¶ Load the data from one or more runs as a table (
pandas.DataFrame
).Parameters: headers : Header or iterable of Headers
The headers to fetch the events for
stream_name : str, optional
Get events from only “event stream” with this name.
Default is ‘primary’
fields : List[str], optional
whitelist of field names of interest; if None, all are returned
Default is None
fill : bool or Iterable[str], optional
Which fields to fill. If True, fill all possible fields.
Each event will have the data filled for the intersection of it’s external keys and the fields requested filled.
Default is False
handler_registry : dict, optional
mapping filestore specs (strings) to handlers (callable classes)
convert_times : bool, optional
Whether to convert times from float (seconds since 1970) to numpy datetime64, using pandas. True by default.
timezone : str, optional
e.g., ‘US/Eastern’; if None, use metadatastore configuration in self.mds.config[‘timezone’]
handler_registry : dict, optional
mapping asset specs (strings) to handlers (callable classes)
localize_times : bool, optional
If the times should be localized to the ‘local’ time zone. If True (the default) the time stamps are converted to the localtime zone (as configure in mds).
This is problematic for several reasons:
- apparent gaps or duplicate times around DST transitions
- incompatibility with every other time stamp (which is in UTC)
however, this makes the dataframe repr look nicer
This implies convert_times.
Defaults to True to preserve back-compatibility.
Returns: table : pandas.DataFrame
-
Broker.
get_images
(headers, name, stream_name='primary', handler_registry=None)¶ This method is deprecated. Use Broker.get_documents instead.
Load image data from one or more runs into a lazy array-like object.
Parameters: headers : Header or list of Headers
name : string
field name (data key) of a detector
handler_registry : dict, optional
mapping spec names (strings) to handlers (callable classes)
Examples
>>> header = DataBroker[-1] >>> images = Images(header, 'my_detector_lightfield') >>> for image in images: # do something
-
Broker.
get_events
(headers, stream_name='primary', fields=None, fill=False, handler_registry=None)¶ Get Event documents from one or more runs.
Parameters: headers : Header or iterable of Headers
The headers to fetch the events for
stream_name : str, optional
Get events from only “event stream” with this name.
Default is ‘primary’
fields : List[str], optional
whitelist of field names of interest; if None, all are returned
Default is None
fill : bool or Iterable[str], optional
Which fields to fill. If True, fill all possible fields.
Each event will have the data filled for the intersection of it’s external keys and the fields requested filled.
Default is False
handler_registry : dict, optional
mapping asset specs (strings) to handlers (callable classes)
Yields: event : Event
The event, optionally with non-scalar data filled in
Raises: ValueError if any key in `fields` is not in at least one descriptor
pre header.
-
Broker.
restream
(headers, fields=None, fill=False)¶ Get all Documents from given run(s).
Parameters: headers : Header or iterable of Headers
header or headers to fetch the documents for
fields : list, optional
whitelist of field names of interest; if None, all are returned
fill : bool, optional
Whether externally-stored data should be filled in. Defaults to False.
Yields: name, doc : tuple
string name of the Document type and the Document itself. Example: (‘start’, {‘time’: …, …})
See also
Examples
>>> def f(name, doc): ... # do something ... >>> h = DataBroker[-1] # most recent header >>> for name, doc in restream(h): ... f(name, doc)
-
Broker.
process
(headers, func, fields=None, fill=False)¶ Pass all the documents from one or more runs into a callback.
Parameters: headers : Header or iterable of Headers
header or headers to process documents from
func : callable
function with the signature f(name, doc) where name is a string and doc is a dict
fields : list, optional
whitelist of field names of interest; if None, all are returned
fill : bool, optional
Whether externally-stored data should be filled in. Defaults to False.
See also
Examples
>>> def f(name, doc): ... # do something ... >>> h = DataBroker[-1] # most recent header >>> process(h, f)
-
databroker.broker.
get_fields
(header, name=None)[source]¶ Return the set of all field names (a.k.a “data keys”) in a header.
Parameters: header : Header
name : string, optional
Get field from only one “event stream” with this name. If None (default) get fields from all event streams.
Returns: fields : set