****************** User Documentation ****************** .. important:: DataBroker release 1.0 includes support for old-style "v1" usage and new-style "v2" usage. This section addresses databroker's new "v2" usage. It is still under development and subject to change in response to user feedback. For the stable usage "v1" usage, see :ref:`v1_index`. See :ref:`transition_plan` for more information. .. ipython:: python :suppress: import os os.makedirs('data', exist_ok=True) from bluesky import RunEngine RE = RunEngine() from bluesky.plans import scan from ophyd.sim import img, motor, motor1, motor2 from suitcase.jsonl import Serializer from bluesky.preprocessors import SupplementalData sd = SupplementalData(baseline=[motor1, motor2]) RE.preprocessors.append(sd) RE.md['proposal_id'] = 12345 for _ in range(5): with Serializer('data') as serializer: uid, = RE(scan([img], motor, -1, 1, 3), serializer) RE.md['proposal_id'] = 6789 for _ in range(7): with Serializer('data') as serializer: RE(scan([img], motor, -1, 1, 3), serializer) serializer.close() from intake.catalog.local import YAMLFileCatalog csx = YAMLFileCatalog('source/_catalogs/csx.yml') # Work around intake#545. csx._container = None import databroker # Monkey-patch to override databroker.catalog so we can directly # add examples instead of taking the trouble to create and then clean up # config files or Python packages of catalogs. from intake.catalog.base import Catalog databroker.catalog = Catalog() databroker.catalog._entries['csx'] = csx for name in ('chx', 'isr', 'xpd', 'sst', 'bmm', 'lix'): databroker.catalog._entries[name] = Catalog() Walkthrough =========== Find a Catalog -------------- When databroker is first imported, it searches for Catalogs on your system, typically provided by a Python package or configuration file that you or an administrator installed. .. ipython:: python from databroker import catalog list(catalog) Each entry is a Catalog that databroker discovered on our system. In this example, we find Catalogs corresponding to different instruments/beamlines. We can access a subcatalog with square brackets, like accessing an item in a dictionary. .. ipython:: python catalog['csx'] List the entries in the 'csx' Catalog. .. ipython:: python list(catalog['csx']) We see Catalogs for raw data and processed data. Let's access the raw one and assign it to a variable for convenience. .. ipython:: python raw = catalog['csx']['raw'] This Catalog contains all the raw data taken at CSX. It contains many entries, as we can see by checking ``len(raw)`` so listing it would take awhile. Instead, we'll look up entries by name or by search. .. note:: As an alternative to ``list(...)``, try using tab-completion to view your options. Typing ``catalog['`` and then hitting the TAB key will list the available entries. Also, these shortcuts can save a little typing. .. code:: python # These three lines are equivalent. catalog['csx']['raw'] catalog['csx', 'raw'] catalog.csx.raw # only works if the entry names are valid Python identifiers Look up a Run by ID ------------------- Suppose you know the unique ID of a run (a.k.a "scan") that we want to access. Note that the first several characters will do; usually 6-8 are enough to uniquely identify a given run. .. ipython:: python run = raw[uid] # where uid is some string like '17531ace' Each run also has a ``scan_id``. The ``scan_id`` is usually easier to remember (it's a counting number, not a random string) but it may not be globally unique. If there are collisions, you'll get the most recent match, so the unique ID is better as a long-term reference. .. ipython:: python run = raw[1] Search for Runs --------------- Suppose you want to sift through multiple runs to examine a range of datasets. .. ipython:: python query = {'proposal_id': 12345} # or, equivalently, dict(proposal_id=12345) search_results = raw.search(query) The result, ``search_results``, is itself a Catalog. .. ipython:: python search_results We can quickly check how many results it contains .. ipython:: python len(search_results) and, if we want, list them. .. ipython:: python list(search_results) Because searching on a Catalog returns another Catalog, we refine our search by searching ``search_results``. In this example we'll use a helper, :class:`~databroker.queries.TimeRange`, to build our query. .. ipython:: python from databroker.queries import TimeRange query = TimeRange(since='2019-09-01', until='2040') search_results.search(query) Other sophisticated queries are possible, such as filtering for scans that include *greater than* 50 points. .. code:: python search_results.search({'num_points': {'$gt': 50}}) See MongoQuerySelectors_ for more. Once we have a result catalog that we are happy with we can list the entries via ``list(search_results)``, access them individually by names as in ``search_results[SOME_UID]`` or loop through them: .. ipython:: python for uid, run in search_results.items(): # Do stuff ... Access Data ----------- Suppose we have a run of interest. .. ipython:: python run = raw[uid] A given run contains multiple logical tables. The number of these tables and their names varies by the particular experiment, but two common ones are * 'primary', the main data of interest, such as a time series of images * 'baseline', readings taken at the beginning and end of the run for alignment and sanity-check purposes To explore a run, we can open its entry by calling it like a function with no arguments: .. ipython:: python run() # or, equivalently, run.get() We can also use tab-completion, as in ``entry['`` TAB, to see the contents. That is, the Run is yet another Catalog, and its contents are the logical tables of data. Finally, let's get one of these tables. .. ipython:: python ds = run.primary.read() ds This is an xarray.Dataset. You can access specific columns .. ipython:: python ds['img'] do mathematical operations .. ipython:: python ds.mean() make quick plots .. ipython:: python @savefig ds_motor_plot.png ds['motor'].plot() and much more. See the documentation on xarray_. If the data is large, it can be convenient to access it lazily, deferring the actual loading network or disk I/O. To do this, replace ``read()`` with ``to_dask()``. You still get back an xarray.Dataset, but it contains placeholders that will fetch the data in chunks and only as needed, rather than greedily pulling all the data into memory from the start. .. ipython:: python ds = run.primary.to_dask() ds See the documentation on dask_. TODO: This is displaying numpy arrays, not dask. Illustrating dask here might require standing up a server. Explore Metadata ---------------- Everything recorded at the start of the run is in ``run.metadata['start']``. .. ipython:: python run.metadata['start'] Information only knowable at the end, like the exit status (success, abort, fail) is stored in ``run.metadata['stop']``. .. ipython:: python run.metadata['stop'] The v1 API stored metadata about devices involved and their configuration, accessed using ``descriptors``, this is roughly equivalent to what is available in ``primary.metadata``. It is quite large, .. ipython:: python run.primary.metadata It is a little flatter with a different layout than was returned by the v1 API. Replay Document Stream ---------------------- Bluesky is built around a streaming-friendly representation of data and metadata. (See event-model_.) To access the run---effectively replaying the chronological stream of documents that were emitted during data acquisition---use the ``documents()`` method. .. versionchanged:: 1.2.0 The ``documents`` method was formerly named ``canonical``. The old name is still supported but deprecated. .. ipython:: python run.documents(fill='yes') This generator yields ``(name, doc)`` pairs and can be fed into streaming visualization, processing, and serialization tools that consume this representation, such as those provided by bluesky. The keyword argument ``fill`` is required. Its allowed values are ``'yes'`` (numpy arrays)`, ``'no'`` (Datum IDs), and ``'delayed'`` (dask arrays, still under development). .. _MongoQuerySelectors: https://docs.mongodb.com/v3.2/reference/operator/query/#query-selectors .. _xarray: https://xarray.pydata.org/en/stable/ .. _dask: https://docs.dask.org/en/latest/ .. _event-model: https://blueskyproject.io/event-model/