API and Design Documentation¶
Design¶
Large Catalogs in Intake¶
A simple intake Catalog populates an internal dictionary, Catalog._entries,
mapping entry names to LocalCatalogEntry objects. This approach does
not scale to large catalogs. Instead, we override
Catalog._make_entries_container() and return a dict-like object. This
object must support iteration (looping through part of all of the catalog in
order) and random access (requesting a specific entry by name) by implementing
__iter__ and __getitem__ respectively.
It should also implement __contains__ because, similarly, if
__contains__ is specifically implemented, Python will iterate through all the
entries and check each in turn. In this case, it is likely more efficient to
implement a __contains__ method that uses __getitem__ to determine
whether a given key is contained.
Finally, the Catalog itself should implement __len__. If it is not
implemented, intake may obtain a Catalog’s length by iterating through it
entirely, which may be costly. If a more efficient approach is possible (e.g. a
COUNT query) it should be implemented.
Integration with BlueskyRun¶
For each file format / backend (MongoDB, newline-delimited JSON, directory of
TIFFs, etc.) one needs to write a single custom catalog. Its entries, created
dynamically as described above, should be LocalCatalogEntry objects that
identify intake_bluesky.core.BlueskyRun as their driver. See below for the
arguments that they would provide to the entry so that it can instantiate the
catalog when called upon.
Core¶
-
class
intake_bluesky.core.BlueskyRun(get_run_start, get_run_stop, get_event_descriptors, get_event_pages, get_event_count, get_resource, lookup_resource_for_datum, get_datum_pages, filler, **kwargs)[source]¶ Catalog representing one Run.
- Parameters
- get_run_start: callable
Expected signature
get_run_start() -> RunStart- get_run_stopcallable
Expected signature
get_run_stop() -> RunStop- get_event_descriptorscallable
Expected signature
get_event_descriptors() -> List[EventDescriptors]- get_event_pagescallable
Expected signature
get_event_pages(descriptor_uid) -> generatorwheregeneratoryields Event documents- get_event_countcallable
Expected signature
get_event_count(descriptor_uid) -> int- get_resourcecallable
Expected signature
get_resource(resource_uid) -> Resource- lookup_resource_for_datumcallable
Expected signature
lookup_resource_for_datum(datum_id) -> resource_uid- get_datum_pagescallable
Expected signature
get_datum_pages(resource_uid) -> generatorwheregeneratoryields Datum documents- fillerevent_model.Filler
- **kwargs :
Additional keyword arguments are passed through to the base class, Catalog.
-
class
intake_bluesky.core.RemoteBlueskyRun(url, http_args, name, parameters, metadata=None, **kwargs)[source]¶ Catalog representing one Run.
This is a client-side proxy to a BlueskyRun stored on a remote server.
- Parameters
- url: str
Address of the server
- headers: dict
HTTP headers to sue in calls
- name: str
handle to reference this data
- parameters: dict
To pass to the server when it instantiates the data source
- metadata: dict
Additional info
- kwargs: ignored
-
class
intake_bluesky.core.BlueskyEventStream(get_run_start, stream_name, get_run_stop, get_event_descriptors, get_event_pages, get_event_count, get_resource, lookup_resource_for_datum, get_datum_pages, filler, metadata, include=None, exclude=None, **kwargs)[source]¶ Catalog representing one Event Stream from one Run.
- Parameters
- get_run_start: callable
Expected signature
get_run_start() -> RunStart- stream_namestring
Stream name, such as ‘primary’.
- get_run_stopcallable
Expected signature
get_run_stop() -> RunStop- get_event_descriptorscallable
Expected signature
get_event_descriptors() -> List[EventDescriptors]- get_event_pagescallable
Expected signature
get_event_pages(descriptor_uid) -> generatorwheregeneratoryields event_page documents- get_event_countcallable
Expected signature
get_event_count(descriptor_uid) -> int- get_resourcecallable
Expected signature
get_resource(resource_uid) -> Resource- lookup_resource_for_datumcallable
Expected signature
lookup_resource_for_datum(datum_id) -> resource_uid- get_datum_pagescallable
Expected signature
get_datum_pages(resource_uid) -> generatorwheregeneratoryields datum_page documents- fillerevent_model.Filler
- metadatadict
passed through to base class
- includelist, optional
Fields (‘data keys’) to include. By default all are included. This parameter is mutually exclusive with
exclude.- excludelist, optional
Fields (‘data keys’) to exclude. By default none are excluded. This parameter is mutually exclusive with
include.- **kwargs :
Additional keyword arguments are passed through to the base class.
-
intake_bluesky.core.documents_to_xarray(*, start_doc, stop_doc, descriptor_docs, get_event_pages, filler, get_resource, lookup_resource_for_datum, get_datum_pages, include=None, exclude=None)[source]¶ Represent the data in one Event stream as an xarray.
- Parameters
- start_doc: dict
RunStart Document
- stop_docdict
RunStop Document
- descriptor_docslist
EventDescriptor Documents
- fillerevent_model.Filler
- get_resourcecallable
Expected signature
get_resource(resource_uid) -> Resource- lookup_resource_for_datumcallable
Expected signature
lookup_resource_for_datum(datum_id) -> resource_uid- get_datum_pagescallable
Expected signature
get_datum_pages(resource_uid) -> generatorwheregeneratoryields datum_page documents- get_event_pagescallable
Expected signature
get_event_pages(descriptor_uid) -> generatorwheregeneratoryields event_page documents- includelist, optional
Fields (‘data keys’) to include. By default all are included. This parameter is mutually exclusive with
exclude.- excludelist, optional
Fields (‘data keys’) to exclude. By default none are excluded. This parameter is mutually exclusive with
include.
- Returns
- datasetxarray.Dataset
-
intake_bluesky.core.parse_handler_registry(handler_registry)[source]¶ Parse mapping of spec name to ‘import path’ into mapping to class itself.
- Parameters
- handler_registrydict
Values may be string ‘import paths’ to classes or actual classes.
Examples
Pass in name; get back actual class.
>>> parse_handler_registry({'my_spec': 'package.module.ClassName'}) {'my_spec': <package.module.ClassName>}
Backend-Specific Catalogs¶
Note
These drivers are currently being developed in intake-bluesky itself, but will eventually be split out into separate repositories to isolate dependencies and release cycles. This will be done once the interface with core is deemed stable.