Release History

A catalog of new features, improvements, and bug-fixes in each release. Follow links to the relevant GitHub issue or pull request for specific code changes and any related discussion.

v1.1.0 (2020-09-03)

Added

  • Experimental databroker.projector module

  • A stats method on the BlueskyMongoCatalog to access MongoDB storage info.

Fixed

  • Do more to try to recover from inaccurate shape metadata.

  • Tolerate old Resource documents that rely on MongoDB _id and are missing uid.

v1.0.6 (2020-06-10)

Fixed

  • Xarray shape is now correct when multiple streams have matching keys.

  • Msgpack and jsonl backed catalogs now find new entries correctly.

  • The order of descriptors in v1.Header.descriptors now matches v0.Header.descriptors.

v1.0.5 (2020-06-04)

Fixed

  • The latest release of intake, v0.6.0, introduced a regression which databroker now works around.

v1.0.4 (2020-06-03)

Internals

  • Adjust our usage of intake’s Entry abstraction in preparation for changes in intake’s upcoming release

Fixed

  • The canonical method now only yields a stop document if it is not None.

v1.0.3 (2020-05-12)

Added

  • Added SingleRunCache which collects the documents from a single run, and when complete, provides a BlueskyRun.

v1.0.2 (2020-04-07)

Fixed

  • databroker now supports mongo backends with authentication.

v1.0.1 (2020-04-03)

Added

  • When a Broker is constructed from a YAML configuration file, the root_map values may be given as relative paths interpreted relative to the location of that configuration file.

Changed

  • The minimum version of the dependency intake has been increased to v0.5.5 and various internal changes have been made to adjust for changes in intake.

Fixed

  • The object LazyMap object now support pickling.

  • The query TimeRange now properly propagates its timezone parameter through replace().

  • If installation with python2 is attempted, a helpful error message is shown.

v1.0.0 (2020-03-13)

This release amounts to a full rewrite of databroker. See Transition Plan for details.

See the v1.0.0 GitHub milestone for a full enumeration of the changes in this release.

v0.13.3 (2019-08-21)

Enhancements

  • Replaced deprecated unordered bulk write (requires pymongo >= 3.0).

Documentation

  • Update the links in the sidebar to point to the Bluesky Project.

Packaging

  • Added missing files in the source distribution for PyPI.

See the v0.13.3 GitHub milestone for a full enumeration of the changes in this release.

v0.13.2 (2019-07-30)

Bug Fixes

  • Support round trip of databroker configs, reporting the config, module and class.

Packaging

  • Removed vestigial dependency on dask, which is no longer used.

See the v0.13.2 GitHub milestone for a full enumeration of the changes in this release.

v0.13.1 (2019-07-30)

Bug Fixes

  • Make sqlite-backed assets registry threadsafe, for compatibility with bluesky 1.6.0.

v0.13.0 (2019-06-06)

API Changes

  • Drop support for Python 2

v0.12.2 (2019-03-11)

Bug Fixes

  • Support round trip of resource

Documentation

  • Fix typos in the tutorial

  • Update installing sentinel code example

v0.12.1 (2019-01-25)

Bug Fixes

  • Fixed a bug in EventSourceShim.docs_given_header when filtering the fields.

v0.12.0 (2019-01-03)

Enhancements/API changes

  • documents() now yields any Resource and Datum documents referenced by Event documents.

  • documents() now yields documents in strict time order which may interlace Events from different streams. Previously documents were yielded in time order by descriptor.

  • Added event_sources_by_name property to BrokerES class

  • Added event_sources kwarg to Broker class

  • Replaced url, timezone and pvs kwargs in ArchiverEventSource class with a config dictionary kwarg and updated other methods to use this.

  • Added name and pvs attributes to ArchiverEventSource class and updated other methods to use these.

  • Added tables_given_times() method to ArchiverEventSource class.

  • Added name property to EventSourceShim class

Bug Fixes

  • Fixed an issue in the tutorial where importing databroker was forgotten.

  • The docstring for the RegistryTemplate class has been fixed.

v0.11.3 (2018-09-05)

Bug Fixes

  • Removes an assumption that Descriptors have a ‘name’ field.

v0.11.2 (2018-06-19)

Bug Fixes

  • Fixed a number of typos in the documentation

  • Fixed rendering issue of the README.md file on PyPI.

v0.11.1 (2018-05-19)

Bug Fixes

  • Fixed limitation whereby sqlite backend could not be used by multiple threads. One important problem with limitation is that it broke the ability to insert documents generated by “monitoring” in bluesky.

  • Removed accidental call to print.

v0.11.0 (2018-05-14)

Enhancements

  • Broker objects now have a db.name attribute which is the name passed into Broker.named or None.

  • Header objects now an ext attribute, containing a SimpleNamespace. By default, it is empty. It is intended to be used to pull metadata from external data sources, such as sample metadata, comments or tags, or proposal information. To register a datasource, add an item to the dictionary Broker.external_fetchers. The value, which should be a callable, will be passed two arguments, the RunStart document and RunStop document, and the result will be added to ext using the key. The callable is expected to handle all special cases (errors, etc.) internally and return None if it has nothing to add.

  • Accept Resource and Datum documents via the generic insert method. To facilitate the “asset refactor” transition in bluesky and ophyd, ignore duplicate attempts to insert a document with the same uid. (This is controllable by a new flag ignore_duplicate_error on the Registry insert methods.)

Bug Fixes

  • The Header.fields() method wrongly ignored its stream_name argument.

v0.10.0 (2018-02-20)

Enhancements

  • Add special name Broker.named('temp') which creates new, temporary storage each time it is called. This is convenient for testing and teaching.

Deprecations

  • The Broker.__call__() method for searching headers by metadata accepted special keyword arguments, start_time and/or end_time, which filtered results by RunStart time. These names proved to be confusing, so they have been renamed since and until (terminology inspired by git log). The original names still work, but warn if used.

Bug Fixes

  • The mongoquery backend returned identical references (i.e. the same dictionary) on subsequent queries, meaning that mutations could propagate across results sets.

  • Ensure there is only one definition of a DuplicateHandler exception.

  • Remove invalid keyword argument from get_images.

v0.9.4 (2017-12-06)

This release contains bug fixes and experimental new features.

Enhancements

  • Add experimental integration with glue.

  • The HDF5 handlers have been refactored, and a new HDF5 handler returning dask objects has been added.

Bug Fixes

  • Rendering the HTML repr (_repr_html_) of a Header produced an unnecesary warning.

  • Headers without a stop document wrongly produced an error and could not be created. This was a regression.

v0.9.3 (2017-09-13)

This release contains one bug fix for a feature that was new in v0.9.0.

Bug Fixes

  • Properly implement “filling” of external data in the case of multiple event streams with different data keys. This case generated a KeyError in v0.9.2.

v0.9.2 (2017-09-11)

This release contains one bug fix for a feature that was new in v0.9.0.

Bug Fixes

  • Allow handlers to be registered via a configuration file. This feature was intended to be added in v0.9.0, but it was broken and unusable.

v0.9.1 (2017-09-06)

This is release contains small but important bug fixes. It is recommended that all users upgrade from v0.9.0.

Bug Fixes

  • Respect the fill kwargs in Header.table() and Broker.get_table(). In v0.9.0, a regression was introduced that always set it to True regardless of the user input.

  • Omit the special value '_legacy_config' from the results returned by list_configs() because it should a (private) synonym for one of the other values.

  • Make document retrieval lazy (as it was intended to be) by removing an internal call to check_fields_exist.

  • Do not attempt to fill external data that has already been filled.

v0.9.0 (2017-08-22)

Overview

This is a major update to databroker.

  • The packages metadatastore, filestore, portable-mds, portable-fs, metadataservice, and metadataclient have all been merged into databroker itself. The individual packages have been deprecated; all future development will occur in databroker.

  • In response to feedback, new convenience functions and methods have been added.

  • The configuration management has been completely overhauled.

Enhancements

See the new Tutorial and the API Documentation.

User-facing API Changes

The following changes may break old user code.

  • DataBroker used to rely indirectly on configuration files located at:

    • /etc/metadatastore.yml or ~/.config/metadatastore/connection.yml

    • /etc/filestore.yml or ~/.config/filestore/connection.yml

    These configuration files are now completely ignored. Users must adopt the new configuration system.

  • The order of parameters to these methods has been rearranged to be mutally consistent.

    We judge that the short-term pain of updating some user code now is less than the long-term pain of asking everyone to keep mental track of random, inconsistent parameter orderings forever.

  • The option handler_override, which overrode handlers by field name, has been removed from all methods and functions that formerly supported it. Use the option handler_registry instead, which overrides handlers by handler spec name—a less complex, less error-prone operation.

  • The method Header.events() defaults to returning only events from the ‘primary’ stream, not all events.

  • Documents refer to other documents by a uid. In past versions of databroker they were dereferenced. That is:

    # Assume run_start, run_stop, descriptor, and event are documents.
    
    # True for databroker version < 0.9.0:
    event['descriptor'] == descriptor
    descriptor['run_start'] == run_start
    run_stop['run_start'] == run_start
    
    # True for databroker version 0.9.0:
    event['descriptor'] == descriptor['uid']
    descriptor['run_start'] == run_start['uid']
    run_stop['run_start'] == run_start['uid']
    
  • The type of db.filters changed from list to dict.

Deprecations

The following changes to recommended usage may produce warnings in user code, but it will not break user code in this release. It may break user code in a future release, so the warnings should be heeded during this cycle if possible.

  • The following usages are deprecated and will stop being supported in a future release:

    # THIS USAGE IS DEPRECATED
    
    from databroker import db
    # or, equivalently:
    from databroker import DataBroker
    

    Instead, do:

    from databroker import Broker
    db = Broker('example')
    

    where example is the name of some configuration. This new approach makes it possible to connect to multiple Brokers in the same process.

    db1 = Broker('laptop')
    db2 = Broker('beamline')
    

    This is useful for transferring data, among other things.

    Likewise, the top-level functions for fetching data are deprecated and will be removed in a future release:

    # THIS USAGE IS DEPRECATED
    
    from databroker import db, get_table, get_events, get_images
    
    h = db[-1]
    get_table(h)
    

    See the new Tutorial for the recommended usage. The short version is:

    from databroker import Broker
    db = Broker.named('example')
    
    h = db[-1]
    h.table()
    
  • The method Broker.get_images and the class Images are deprecated and may be removed in a future release. See issue for the motivating discussion. Use the new method Header.data(), as illustrated in the Tutorial.

  • Databroker uses a custom dictionary subclass that supports dot access like event.data as a synonym for item lookup like event['data']. Employing a custom dictionary subclass has downsides, including performance and complexity. We are considering defaulting to plain dictionaries in a future release, which would break any user code that relies on dot access. To prepare for this possible change, the usage event.data now produces a warning advising users to switch to event['data']. More detail is available in the section Advanced: Controlling the Return Type.

  • The method Header.stream() has been renamed Header.documents(). The old name issues a warning in this release; it will be removed in a future one.

  • The modules databroker.broker and databroker.core have been combined, and all public members are importable from just databroker. The old modules will be maintained as shims to avoid breaking user code.

Internal API Changes

The following API changes affect the libraries that have been merged into databroker in this release (metadatastore, filestore, portable-mds, portable-fs, metadataclient, metadataservice). These changes are internal to databroker and will only affect advanced users.

  • All FileStore classes have been renamed to Registry.

  • The method change_root, which is implemented on various Registry classes, has been renamed to move_files.

  • The method get_datum on Registry classes is fully removed. Use retrieve instead.

  • The version keyword argument has been removed from all Registry classes and MDS classes. It is now part of the config dictionary.

  • A script for launching the “metadataservice” server has been moved to a CLI named start_md_server.

  • The “writers” formerly in filestore now require a Registry as an argument.

  • The modules filestore.commands and filestore.api have been removed. Same for metadatastore.commands and metadatastore.api.

  • Registry.correct_root supports uids as args, verify is now optional and defaults to False, arg resource` is now ``resource_or_uid

v0.8.4 (2017-05-24)

(TO DO)

v0.8.3 (2017-05-23)

(TO DO)

v0.8.2 (2017-05-22)

(TO DO)

v0.8.1 (2017-05-22)

(TO DO)

v0.8.0

API Changes

databroker.core

This module is semi-private

  • Removed process, stream, and restream as top-level functions. The implementation now lives in databroker.broker.BrokerES. These functions knew too much about the internals of the databroker to remain as separate functions.

  • Broker.__call__ returns an iterable Results object, akin to a generator, instead of a list. This means that queries with large results sets return quickly. Iterating through the Headers in the result set is up to the caller.

Header.from_run_start

Take a Broker object instead of a MetadataStore object. This is now tacked on the Header object.

Changes to functions in databroker.core

Explicitly passed mds/fs have been removed, instead relying on the DataBroker instance included in the header.

Break up internal structure of databroker

  • The core functions that touch events have a new required argument, es. This does not affect the API of the Broker object, only the functions in the core module.

Top level insert

Broker now has an insert method, use this over db.mds.insert.

v0.7.0 (2016-12-21)

Enhancements

  • Add convenience method for exporting from one Broker instance into another.

  • Experimental: support regex-based field selection in Broker methods.

Bug Fixes

  • Fix handling of timezones. To summarize: all times are stored as a float number that is a UNIX timestamp (number of seconds since 1970 in Greenwich). The get_events method simply returns this raw number. The get_table method provides the option (on by default) to convert these float numbers to datetime objects, which can be more convenient in some circumstances. There are two flags for controlling this feature: convert_times and localize_times. By default, convert_times=True and localize_times=True. This returns pandas datetime64 objects that are “naive” (meaning they don’t have a timezone attached) and are in the local time. This tells you the wall clock time when the experiment was performed in timezone configured in db.mds.config['timezone']. If localize_times=False, the datetime objects are again “naive” but in UTC time. This tells you the wall clock time of a clock in Greenwich when the experiment was performed.

v0.6.2 (2016-09-28)

(TO DO)

v0.6.1 (2016-09-23)

Enhancements

  • Remove hard dependency on metadatastore and filestore packages so that other providers of metadatastore and filestore interface may be used instead.

v0.6.0 (2016-09-07)

Bug Fixes

Make get_table properly respect its stream_name argument. (Previously any value but default returned an empty result.)

Enhancements

  • Allow Broker to be imported without configuration present.

  • Add stateful “filters” to restrict searches.

  • Add aliases to save searches during a session.

  • Overhaul documents and tests.

API Changes

  • The default value of stream_name in get_events is now ALL, a sentinel value. (The default for get_table is still 'primary', but it now also accepts ALL.)

v0.5.0 (2016-07-25)

API Changes

  • Change kwarg name to stream_name to select a descriptor by name

  • Requires filestore >= v0.5.0

Enhancements

  • Learned how to get all of the FS resource documents from a header

v0.4.1 (2016-05-09)

(TO DO)

v0.4.0 (2016-05-02)

(TO DO)

v0.3.3 (2016-02-26)

(TO DO)

v0.3.2 (2016-02-23)

(TO DO)

v0.3.1 (2016-02-04)

(TO DO)

v0.3.1 (2015-09-29)

(TO DO)

dataportal v0.2.2

Bug Fixes

  • Times, as returned by pandas-aware functions, are now reported correctly. Previously, these times were being reported as UTC, which is 4 or 5 hours different from US/Eastern time, depending on the time of year. (GH209)

dataportal v0.2.1

API Changes

  • get_images (as alias for Images) added for consistency with other function names

  • SubtractedImages removed; prefer PIMS pipeline feature.

dataportal v0.2.0 (2015-09-15)

API Changes

  • DataBroker[] for slicing by scan ID(s) or recency

  • DataBroker() for building queries from keyword arguments

  • get_events return events generator

  • get_table return DataFrame

  • Header, vastly simplified: it is merely Document with a dedicated constructor that accepts a Run Start Document

Dataportal v0.0.6

Enhancements

  • A new StepScan interface acts like DataBroker but immediately returns tabular data as a DataFrame in one step. (GH136)

  • Look up scans by the name of a detector or motor used. For example, to get all scans that measured ‘Tsam’, use DataBroker.find_headers(data_key='Tsam'). (GH88, GH107)

  • Look up scans using the first few characters of its unique ID, like DataBroker['aow23oif']. To be clear this is the ophyd-provided uid, not the mongo _id. (GH130, GH131)

  • Replay remembers settings when flipping between scans, and it retains these settings between sessions. (GH114)

  • DataMuxer.to_sparse_dataframe returns all data with one Event per row. (GH134)

  • DataMuxer.plan.bin_on and DataMuxer.plan.bin_by_edges explain the planned operation of DataMuxer.bin_on and DataMuxer.bin_by_edges for a given data set and given arguments. (GH134)

API Changes

  • The Event documents are reorganized to be more intuitive and require less typing. Formerly, event.data returned a dictionary of (value, timestamp) tuples.

    event.data = {'motor1': (value, timestamp),
                  'motor2': (value, timestamp)}
    

    Now, event.data is dictionary of the data

    event.data = {'motor1': value, 'motor2': value}
    

    and event.timestamp is a dictionary of the timestamps.

    event.timestamps = {'motor1': timestamp, 'motor2': timestamp}
    

    (GH129, GH132)

  • All functions that return Documents, including Headers, Events, and everything stored in metadatastore and filestore, now return Python generators, iterable objects that load data one element at a time. To convert these to normal Python lists, simply use list(gen). (GH127)

  • The output of DataMuxer.bin_* functions is indexed by bin number (0, 1, 2…). The Event time is given as a column.

  • Metadatastore and filestore require configuration settings. They look in the following locations, in increasing order of precedence. Use #3 or #4 to customize your own metadatastore and filestore.

    1. CONDA_ENV/etc/name.yaml (if CONDA_ETC_ env is defined)

    2. /etc/name.yaml

    3. ~/.config/name/connection.yml

    4. reading environmental variables formatted like MDS_DATABASE or FS_DATABASE

    For example, in ~/.config/metadatastore/connection.yaml

    host: localhost
    port: 27017
    database: my_metadatastore
    timezone: US/Eastern
    

    and likewise with filestore/connection.yaml. (Filestore does not need a timezone field, however.) If no configuration can be found, they will raise an error on import. We avoid defaults so that experimental data cannot be accidentally saved to an unsafe destination.

  • Metadatastore configuration also requires a timezone field, which is uses to interpret human-friendly datetimes.

Bug Fixes

  • All DataMuxer output is sorted by Event time. (GH134)