API and Design Documentation¶
Design¶
Large Catalogs in Intake¶
A simple intake Catalog populates an internal dictionary, Catalog._entries
,
mapping entry names to LocalCatalogEntry
objects. This approach does
not scale to large catalogs. Instead, we override
Catalog._make_entries_container()
and return a dict-like object. This
object must support iteration (looping through part of all of the catalog in
order) and random access (requesting a specific entry by name) by implementing
__iter__
and __getitem__
respectively.
It should also implement __contains__
because, similarly, if
__contains__
is specifically implemented, Python will iterate through all the
entries and check each in turn. In this case, it is likely more efficient to
implement a __contains__
method that uses __getitem__
to determine
whether a given key is contained.
Finally, the Catalog itself should implement __len__
. If it is not
implemented, intake may obtain a Catalog’s length by iterating through it
entirely, which may be costly. If a more efficient approach is possible (e.g. a
COUNT query) it should be implemented.
Integration with BlueskyRun¶
For each file format / backend (MongoDB, newline-delimited JSON, directory of
TIFFs, etc.) one needs to write a single custom catalog. Its entries, created
dynamically as described above, should be LocalCatalogEntry
objects that
identify intake_bluesky.core.BlueskyRun
as their driver. See below for the
arguments that they would provide to the entry so that it can instantiate the
catalog when called upon.
Core¶
-
class
intake_bluesky.core.
BlueskyRun
(get_run_start, get_run_stop, get_event_descriptors, get_event_pages, get_event_count, get_resource, lookup_resource_for_datum, get_datum_pages, filler, **kwargs)[source]¶ Catalog representing one Run.
- Parameters
- get_run_start: callable
Expected signature
get_run_start() -> RunStart
- get_run_stopcallable
Expected signature
get_run_stop() -> RunStop
- get_event_descriptorscallable
Expected signature
get_event_descriptors() -> List[EventDescriptors]
- get_event_pagescallable
Expected signature
get_event_pages(descriptor_uid) -> generator
wheregenerator
yields Event documents- get_event_countcallable
Expected signature
get_event_count(descriptor_uid) -> int
- get_resourcecallable
Expected signature
get_resource(resource_uid) -> Resource
- lookup_resource_for_datumcallable
Expected signature
lookup_resource_for_datum(datum_id) -> resource_uid
- get_datum_pagescallable
Expected signature
get_datum_pages(resource_uid) -> generator
wheregenerator
yields Datum documents- fillerevent_model.Filler
- **kwargs :
Additional keyword arguments are passed through to the base class, Catalog.
-
class
intake_bluesky.core.
RemoteBlueskyRun
(url, http_args, name, parameters, metadata=None, **kwargs)[source]¶ Catalog representing one Run.
This is a client-side proxy to a BlueskyRun stored on a remote server.
- Parameters
- url: str
Address of the server
- headers: dict
HTTP headers to sue in calls
- name: str
handle to reference this data
- parameters: dict
To pass to the server when it instantiates the data source
- metadata: dict
Additional info
- kwargs: ignored
-
class
intake_bluesky.core.
BlueskyEventStream
(get_run_start, stream_name, get_run_stop, get_event_descriptors, get_event_pages, get_event_count, get_resource, lookup_resource_for_datum, get_datum_pages, filler, metadata, include=None, exclude=None, **kwargs)[source]¶ Catalog representing one Event Stream from one Run.
- Parameters
- get_run_start: callable
Expected signature
get_run_start() -> RunStart
- stream_namestring
Stream name, such as ‘primary’.
- get_run_stopcallable
Expected signature
get_run_stop() -> RunStop
- get_event_descriptorscallable
Expected signature
get_event_descriptors() -> List[EventDescriptors]
- get_event_pagescallable
Expected signature
get_event_pages(descriptor_uid) -> generator
wheregenerator
yields event_page documents- get_event_countcallable
Expected signature
get_event_count(descriptor_uid) -> int
- get_resourcecallable
Expected signature
get_resource(resource_uid) -> Resource
- lookup_resource_for_datumcallable
Expected signature
lookup_resource_for_datum(datum_id) -> resource_uid
- get_datum_pagescallable
Expected signature
get_datum_pages(resource_uid) -> generator
wheregenerator
yields datum_page documents- fillerevent_model.Filler
- metadatadict
passed through to base class
- includelist, optional
Fields (‘data keys’) to include. By default all are included. This parameter is mutually exclusive with
exclude
.- excludelist, optional
Fields (‘data keys’) to exclude. By default none are excluded. This parameter is mutually exclusive with
include
.- **kwargs :
Additional keyword arguments are passed through to the base class.
-
intake_bluesky.core.
documents_to_xarray
(*, start_doc, stop_doc, descriptor_docs, get_event_pages, filler, get_resource, lookup_resource_for_datum, get_datum_pages, include=None, exclude=None)[source]¶ Represent the data in one Event stream as an xarray.
- Parameters
- start_doc: dict
RunStart Document
- stop_docdict
RunStop Document
- descriptor_docslist
EventDescriptor Documents
- fillerevent_model.Filler
- get_resourcecallable
Expected signature
get_resource(resource_uid) -> Resource
- lookup_resource_for_datumcallable
Expected signature
lookup_resource_for_datum(datum_id) -> resource_uid
- get_datum_pagescallable
Expected signature
get_datum_pages(resource_uid) -> generator
wheregenerator
yields datum_page documents- get_event_pagescallable
Expected signature
get_event_pages(descriptor_uid) -> generator
wheregenerator
yields event_page documents- includelist, optional
Fields (‘data keys’) to include. By default all are included. This parameter is mutually exclusive with
exclude
.- excludelist, optional
Fields (‘data keys’) to exclude. By default none are excluded. This parameter is mutually exclusive with
include
.
- Returns
- datasetxarray.Dataset
-
intake_bluesky.core.
parse_handler_registry
(handler_registry)[source]¶ Parse mapping of spec name to ‘import path’ into mapping to class itself.
- Parameters
- handler_registrydict
Values may be string ‘import paths’ to classes or actual classes.
Examples
Pass in name; get back actual class.
>>> parse_handler_registry({'my_spec': 'package.module.ClassName'}) {'my_spec': <package.module.ClassName>}
Backend-Specific Catalogs¶
Note
These drivers are currently being developed in intake-bluesky itself, but will eventually be split out into separate repositories to isolate dependencies and release cycles. This will be done once the interface with core is deemed stable.