===================================================
 Promote Resource / Datum to first-class documents
===================================================

.. contents::
   :local:


Status
======

**Discussion**


Branches and Pull requests
==========================


Abstract
========

Currently *Resource* and *Datum* are directly inserted into the
*AssetRegistry* by *ophyd*.  This breaks the document abstractions
by making a specific consumer 'special'.


Detailed description
====================

An odd asymmetry in how databroker works is that the documents for
*HeaderSource* and *EventSource* are emitted by the *RunEngine* and
can be subscribed to by one or more consumers.  Each consumer is
notionally independent, each receive all of the documents, and do not
need to coordinate in any way (or even be aware of one another's
existence). In contrast, the *Resource* and *Datum* documents are
inserted directly into an *AssetRegistry* by the *ophyd* objects.
This breaks the separation we have between the data collection process
/ hardware, the generation of the documents, and the consumers of
those documents and leads to several unfortunate situations:

 - *ohpyd* objects hold an instance of an *AssetRegisty*
 - we need to keep track of **which** *AssertRegistry* things were
   inserted into
 - consumers that want access to the asset documents need to also have
   a handle to the database that the objects are inserting into

The proposed solution is to promote *Resource* and *Datum* documents
to be peer documents with *Start*, *Stop*, *Descriptor* and *Event*.
They will appear in the document stream and be inserted into
*DataBroker* via ``db.insert``.  This eliminates the 'special'
side-band communication and brings all consumers back to the same
footing.  This will require coordinated changes to *event-model*,
*databroker*, *bluesky*, and *ophyd*.


Implementation
==============

Currently, *ophyd* is responsible for collecting all of the values for
the *Resource* and *Datum* documents except for the uids.  The uids
are generated by calls to ``reg.register_*`` and the datum uids are
subsequently returned to the *RunEngine* via ``obj.read``.  The
proposed change is:

 1. *ophyd* objects would be responsible for generating the full *Resource*
    and *Datum* documents and providing them to the *RunEngine* to be
    emitted.  *ophyd* may provide some helpers to make generating compliant
    documents easy.

    a. Similar to the current documents, a *Resource* must be emitted
       before any *Datum* that refers to it.  A *Datum* can only refer
       to a *Resource* that as been emitted after the most recent
       *Start* and before the *Stop* for the most recent *Start*.

    b. an identical (including uid) *Resource* and *Datum* maybe
       emitted more than once, the consumers will need to handle this.

    c. The *Datum* documents must be yielded only in the first
       ``collect_asset_docs`` for which there UID is in ``read``.

    d. The *Resource* documents must only be yielded in the first
       ``collect_asset_docs`` which includes a *Datum* that refers to
       it.


    e. Calls to ``read`` and ``collect_asset_docs`` must be
       idempotent.

    Identical *Resource* and *Datum* documents are to support a single
    *Resource* that may span many runs, such as background images, and
    still ensure that with in the scope of a *Start* / *Stop* pair a
    consumer will see all of the documents required.


 2. in ``save`` before the *Event* document is emitted the *RunEngine*
    will acquire and emit any *AssetRegistry* Documents.

    a. in ``save`` the *RunEngine* knows what objects in the bundle, call
       ``collect_asset_docs`` method ::

            def collect_asset_docs(self) -> Iterator[Tuple[str, Dict[str, Any]]]:
                ...

       which will yield the ``(name, doc)`` pairs for anything that
       was just read.

    b. these documents will be emitted **before** the *Event*

 3. consumers will now have access to all relevant documents and can
    do what ever they want with them (insert into an asset registry,
    live processing / display, copy files else where)

event-model
-----------

 1. add schema for *Resource* and *Datum*
 2. assert that datum_id must be of the form ``{resource_id}/{N}``.
    This is required to support columnar stores where the *Datum*
    documents are group by *Resource* id.


databroker
----------

 1. teach ``insert`` how to deal with the additional documents.
 2. revert API changes to use ``register_*`` which generate the uids.
 3. helper tools for generating *Resource* and *Datum* documents
    (maybe in ohpyd?)

ophyd
-----

 1. implement new document generation methods on all devices that have
    external data.


bluesky
-------

 1. implement above logic in ``RunEngine._save``

Backward Compatibility
======================

This will break all of the devices that currently use *AssetRegistry*,
however it will not change anything on the retrieve side.  The
constraints on the *datum_id* can not be applied retro-actively, but
can be applied to all future data.

This excludes the option of having IOCs directly insert *Resource* and
*Datum* documents and expose *datum_id* values to the EPICS layer.  We
only have one experimental use of this (GeRM caproto IOC).  This level
of flexibility is not worth non-uniformity at the document level.  If
we want to have the IOC generate all of the values (including the
uids), then they should expose those values to EPICs and the *ophyd*
object will only be responsible for marshaling those values.

Alternatives
============

Eliminate *Resource* and *Datum* as stand alone documents
---------------------------------------------------------

An alternative considered was to eliminate the *Resource* and *Datum*
documents all together by merging *Resource* into *Descriptor* and
*Datum* into *Event*.  However, this would break several long-standing
design principles:

  - all values in ``ev['data']`` are unstructured (scalar, strings, arrays)
  - *Descriptors* are immutable

In addition to breaking the insert side, this would also be a major
change on the retrieval side and would require maintaining either two
implementations forever or to migrate all existing data.

This would also require the *ophyd* objects having a way to notify the
``RunEngine`` that it's configuration / resource was stale so that the
*Descriptor* cache could be invalidated.  (this is probably a good
idea anyway).

Despite being superficially simpler, the fallout from this alternative
would be far greater.