=================================================== Promote Resource / Datum to first-class documents =================================================== .. contents:: :local: Status ====== **Discussion** Branches and Pull requests ========================== Abstract ======== Currently *Resource* and *Datum* are directly inserted into the *AssetRegistry* by *ophyd*. This breaks the document abstractions by making a specific consumer 'special'. Detailed description ==================== An odd asymmetry in how databroker works is that the documents for *HeaderSource* and *EventSource* are emitted by the *RunEngine* and can be subscribed to by one or more consumers. Each consumer is notionally independent, each receive all of the documents, and do not need to coordinate in any way (or even be aware of one another's existence). In contrast, the *Resource* and *Datum* documents are inserted directly into an *AssetRegistry* by the *ophyd* objects. This breaks the separation we have between the data collection process / hardware, the generation of the documents, and the consumers of those documents and leads to several unfortunate situations: - *ohpyd* objects hold an instance of an *AssetRegisty* - we need to keep track of **which** *AssertRegistry* things were inserted into - consumers that want access to the asset documents need to also have a handle to the database that the objects are inserting into The proposed solution is to promote *Resource* and *Datum* documents to be peer documents with *Start*, *Stop*, *Descriptor* and *Event*. They will appear in the document stream and be inserted into *DataBroker* via ``db.insert``. This eliminates the 'special' side-band communication and brings all consumers back to the same footing. This will require coordinated changes to *event-model*, *databroker*, *bluesky*, and *ophyd*. Implementation ============== Currently, *ophyd* is responsible for collecting all of the values for the *Resource* and *Datum* documents except for the uids. The uids are generated by calls to ``reg.register_*`` and the datum uids are subsequently returned to the *RunEngine* via ``obj.read``. The proposed change is: 1. *ophyd* objects would be responsible for generating the full *Resource* and *Datum* documents and providing them to the *RunEngine* to be emitted. *ophyd* may provide some helpers to make generating compliant documents easy. a. Similar to the current documents, a *Resource* must be emitted before any *Datum* that refers to it. A *Datum* can only refer to a *Resource* that as been emitted after the most recent *Start* and before the *Stop* for the most recent *Start*. b. an identical (including uid) *Resource* and *Datum* maybe emitted more than once, the consumers will need to handle this. c. The *Datum* documents must be yielded only in the first ``collect_asset_docs`` for which there UID is in ``read``. d. The *Resource* documents must only be yielded in the first ``collect_asset_docs`` which includes a *Datum* that refers to it. e. Calls to ``read`` and ``collect_asset_docs`` must be idempotent. Identical *Resource* and *Datum* documents are to support a single *Resource* that may span many runs, such as background images, and still ensure that with in the scope of a *Start* / *Stop* pair a consumer will see all of the documents required. 2. in ``save`` before the *Event* document is emitted the *RunEngine* will acquire and emit any *AssetRegistry* Documents. a. in ``save`` the *RunEngine* knows what objects in the bundle, call ``collect_asset_docs`` method :: def collect_asset_docs(self) -> Iterator[Tuple[str, Dict[str, Any]]]: ... which will yield the ``(name, doc)`` pairs for anything that was just read. b. these documents will be emitted **before** the *Event* 3. consumers will now have access to all relevant documents and can do what ever they want with them (insert into an asset registry, live processing / display, copy files else where) event-model ----------- 1. add schema for *Resource* and *Datum* 2. assert that datum_id must be of the form ``{resource_id}/{N}``. This is required to support columnar stores where the *Datum* documents are group by *Resource* id. databroker ---------- 1. teach ``insert`` how to deal with the additional documents. 2. revert API changes to use ``register_*`` which generate the uids. 3. helper tools for generating *Resource* and *Datum* documents (maybe in ohpyd?) ophyd ----- 1. implement new document generation methods on all devices that have external data. bluesky ------- 1. implement above logic in ``RunEngine._save`` Backward Compatibility ====================== This will break all of the devices that currently use *AssetRegistry*, however it will not change anything on the retrieve side. The constraints on the *datum_id* can not be applied retro-actively, but can be applied to all future data. This excludes the option of having IOCs directly insert *Resource* and *Datum* documents and expose *datum_id* values to the EPICS layer. We only have one experimental use of this (GeRM caproto IOC). This level of flexibility is not worth non-uniformity at the document level. If we want to have the IOC generate all of the values (including the uids), then they should expose those values to EPICs and the *ophyd* object will only be responsible for marshaling those values. Alternatives ============ Eliminate *Resource* and *Datum* as stand alone documents --------------------------------------------------------- An alternative considered was to eliminate the *Resource* and *Datum* documents all together by merging *Resource* into *Descriptor* and *Datum* into *Event*. However, this would break several long-standing design principles: - all values in ``ev['data']`` are unstructured (scalar, strings, arrays) - *Descriptors* are immutable In addition to breaking the insert side, this would also be a major change on the retrieval side and would require maintaining either two implementations forever or to migrate all existing data. This would also require the *ophyd* objects having a way to notify the ``RunEngine`` that it's configuration / resource was stale so that the *Descriptor* cache could be invalidated. (this is probably a good idea anyway). Despite being superficially simpler, the fallout from this alternative would be far greater.