Intake Bluesky¶

Search for data, and retrieve it as SciPy/PyData data structures for interactive data exploration or in a representation suitable for streaming applications .

Intake loads data from a growing variety of formats into familiar Scipy/PyData data structures. Bluesky is a suite of co-developed Python packages for data acquisition and management designed to drive experiments and capture data and metadata from experiments and simulations in a way that interfaces naturally with open-source software in general and the scientific Python ecosystem in particular. Intake-Bluesky applies intake to bluesky.

Its Catalogs’ search functionality leverages the MongoDB query language instead of performing plain-text search.
Intake-Bluesky ships Catalogs that embody the semantics of bluesky’s data model. A bluesky “run” (e.g. one scan) is represented by a BlueskyRun. Each logical table of data within a given run is represented by a BlueskyEventStream.
The methods read() and to_dask() provide the data in SciPy/PyData structures and their “lazy” dask-backed counterparts, as with any other intake data source.
The additional method read_canonical() returns a generator suitable for streaming. Its elements satisfy bluesky’s data model and can be fed into streaming visualization, processing, and serialization tools that consume this representation.

Bluesky is unopinionated about file formats. It provides a variety of serializers for encoding the stream of acquired data to persistent storage. (Note that large detectors may write directly to disk, in which case bluesky records, in effect, a pointer.) Different formats and storage may be appropriate for different scientific domains and scales. A single graduate student might dump their data into local files, whereas a lab or facility might use a MongoDB instance. Intake-Bluesky will address the range of possible use cases by implementing an intake driver for each serializer. Currently supported:

BlueskyMongoCatalog — Backed by MongoDB
BlueskyJSONLCatalog — Backed by a set of newline-delimited JSON files, illustrating “minimal deployment overhead” use case

Intake-Bluesky will also address the use case of reading files not produced by bluesky, retrofitting the semantics of its data model. Thus, a “bucket of files” such as a directory of TIFFs could be fed through tools that consume the representation returned by read_canonical().

Note

These drivers are currently being developed in intake-bluesky itself, but will eventually be split out into separate repositories to isolate dependencies and release cycles. This will be done once the interface with core is deemed stable.