API and Design Documentation

Design

Large Catalogs in Intake

A simple intake Catalog populates an internal dictionary, Catalog._entries, mapping entry names to LocalCatalogEntry objects. This approach does not scale to large catalogs. Instead, we override Catalog._make_entries_container() and return a dict-like object. This object must support iteration (looping through part of all of the catalog in order) and random access (requesting a specific entry by name) by implementing __iter__ and __getitem__ respectively.

It should also implement __contains__ because, similarly, if __contains__ is specifically implemented, Python will iterate through all the entries and check each in turn. In this case, it is likely more efficient to implement a __contains__ method that uses __getitem__ to determine whether a given key is contained.

Finally, the Catalog itself should implement __len__. If it is not implemented, intake may obtain a Catalog’s length by iterating through it entirely, which may be costly. If a more efficient approach is possible (e.g. a COUNT query) it should be implemented.

Integration with BlueskyRun

For each file format / backend (MongoDB, newline-delimited JSON, directory of TIFFs, etc.) one needs to write a single custom catalog. Its entries, created dynamically as described above, should be LocalCatalogEntry objects that identify intake_bluesky.core.BlueskyRun as their driver. See below for the arguments that they would provide to the entry so that it can instantiate the catalog when called upon.

Core

class intake_bluesky.core.BlueskyRun(get_run_start, get_run_stop, get_event_descriptors, get_event_pages, get_event_count, get_resource, lookup_resource_for_datum, get_datum_pages, filler, **kwargs)[source]

Catalog representing one Run.

Parameters
get_run_start: callable

Expected signature get_run_start() -> RunStart

get_run_stopcallable

Expected signature get_run_stop() -> RunStop

get_event_descriptorscallable

Expected signature get_event_descriptors() -> List[EventDescriptors]

get_event_pagescallable

Expected signature get_event_pages(descriptor_uid) -> generator where generator yields Event documents

get_event_countcallable

Expected signature get_event_count(descriptor_uid) -> int

get_resourcecallable

Expected signature get_resource(resource_uid) -> Resource

lookup_resource_for_datumcallable

Expected signature lookup_resource_for_datum(datum_id) -> resource_uid

get_datum_pagescallable

Expected signature get_datum_pages(resource_uid) -> generator where generator yields Datum documents

fillerevent_model.Filler
**kwargs :

Additional keyword arguments are passed through to the base class, Catalog.

read(self)[source]

Load entire dataset into a container and return it

read_partition(self, index)[source]

Fetch one chunk of documents.

read_partition_unfilled(self, i)[source]

Fetch one chunk of documents.

to_dask(self)[source]

Return a dask container for this data source

class intake_bluesky.core.RemoteBlueskyRun(url, http_args, name, parameters, metadata=None, **kwargs)[source]

Catalog representing one Run.

This is a client-side proxy to a BlueskyRun stored on a remote server.

Parameters
url: str

Address of the server

headers: dict

HTTP headers to sue in calls

name: str

handle to reference this data

parameters: dict

To pass to the server when it instantiates the data source

metadata: dict

Additional info

kwargs: ignored
read(self)[source]

Load entire dataset into a container and return it

to_dask(self)[source]

Return a dask container for this data source

class intake_bluesky.core.BlueskyEventStream(get_run_start, stream_name, get_run_stop, get_event_descriptors, get_event_pages, get_event_count, get_resource, lookup_resource_for_datum, get_datum_pages, filler, metadata, include=None, exclude=None, **kwargs)[source]

Catalog representing one Event Stream from one Run.

Parameters
get_run_start: callable

Expected signature get_run_start() -> RunStart

stream_namestring

Stream name, such as ‘primary’.

get_run_stopcallable

Expected signature get_run_stop() -> RunStop

get_event_descriptorscallable

Expected signature get_event_descriptors() -> List[EventDescriptors]

get_event_pagescallable

Expected signature get_event_pages(descriptor_uid) -> generator where generator yields event_page documents

get_event_countcallable

Expected signature get_event_count(descriptor_uid) -> int

get_resourcecallable

Expected signature get_resource(resource_uid) -> Resource

lookup_resource_for_datumcallable

Expected signature lookup_resource_for_datum(datum_id) -> resource_uid

get_datum_pagescallable

Expected signature get_datum_pages(resource_uid) -> generator where generator yields datum_page documents

fillerevent_model.Filler
metadatadict

passed through to base class

includelist, optional

Fields (‘data keys’) to include. By default all are included. This parameter is mutually exclusive with exclude.

excludelist, optional

Fields (‘data keys’) to exclude. By default none are excluded. This parameter is mutually exclusive with include.

**kwargs :

Additional keyword arguments are passed through to the base class.

intake_bluesky.core.documents_to_xarray(*, start_doc, stop_doc, descriptor_docs, get_event_pages, filler, get_resource, lookup_resource_for_datum, get_datum_pages, include=None, exclude=None)[source]

Represent the data in one Event stream as an xarray.

Parameters
start_doc: dict

RunStart Document

stop_docdict

RunStop Document

descriptor_docslist

EventDescriptor Documents

fillerevent_model.Filler
get_resourcecallable

Expected signature get_resource(resource_uid) -> Resource

lookup_resource_for_datumcallable

Expected signature lookup_resource_for_datum(datum_id) -> resource_uid

get_datum_pagescallable

Expected signature get_datum_pages(resource_uid) -> generator where generator yields datum_page documents

get_event_pagescallable

Expected signature get_event_pages(descriptor_uid) -> generator where generator yields event_page documents

includelist, optional

Fields (‘data keys’) to include. By default all are included. This parameter is mutually exclusive with exclude.

excludelist, optional

Fields (‘data keys’) to exclude. By default none are excluded. This parameter is mutually exclusive with include.

Returns
datasetxarray.Dataset
intake_bluesky.core.parse_handler_registry(handler_registry)[source]

Parse mapping of spec name to ‘import path’ into mapping to class itself.

Parameters
handler_registrydict

Values may be string ‘import paths’ to classes or actual classes.

Examples

Pass in name; get back actual class.

>>> parse_handler_registry({'my_spec': 'package.module.ClassName'})
{'my_spec': <package.module.ClassName>}

Backend-Specific Catalogs

Note

These drivers are currently being developed in intake-bluesky itself, but will eventually be split out into separate repositories to isolate dependencies and release cycles. This will be done once the interface with core is deemed stable.

class intake_bluesky.mongo_normalized.BlueskyMongoCatalog(metadatastore_db, asset_registry_db, *, handler_registry=None, query=None, **kwargs)[source]
search(self, query)[source]

Return a new Catalog with a subset of the entries in this Catalog.

Parameters
querydict

MongoDB query.

class intake_bluesky.jsonl.BlueskyJSONLCatalog(paths, *, handler_registry=None, query=None, **kwargs)[source]
search(self, query)[source]

Return a new Catalog with a subset of the entries in this Catalog.

Parameters
querydict