Intake Bluesky¶
Search for data, and retrieve it as SciPy/PyData data structures for interactive data exploration or in a representation suitable for streaming applications .
Intake loads data from a growing variety of formats into familiar Scipy/PyData data structures. Bluesky is a suite of co-developed Python packages for data acquisition and management designed to drive experiments and capture data and metadata from experiments and simulations in a way that interfaces naturally with open-source software in general and the scientific Python ecosystem in particular. Intake-Bluesky applies intake to bluesky.
Its Catalogs’ search functionality leverages the MongoDB query language instead of performing plain-text search.
Intake-Bluesky ships Catalogs that embody the semantics of bluesky’s data model. A bluesky “run” (e.g. one scan) is represented by a
BlueskyRun
. Each logical table of data within a given run is represented by aBlueskyEventStream
.The methods
read()
andto_dask()
provide the data in SciPy/PyData structures and their “lazy” dask-backed counterparts, as with any other intake data source.The additional method
read_canonical()
returns a generator suitable for streaming. Its elements satisfy bluesky’s data model and can be fed into streaming visualization, processing, and serialization tools that consume this representation.
Bluesky is unopinionated about file formats. It provides a variety of serializers for encoding the stream of acquired data to persistent storage. (Note that large detectors may write directly to disk, in which case bluesky records, in effect, a pointer.) Different formats and storage may be appropriate for different scientific domains and scales. A single graduate student might dump their data into local files, whereas a lab or facility might use a MongoDB instance. Intake-Bluesky will address the range of possible use cases by implementing an intake driver for each serializer. Currently supported:
BlueskyMongoCatalog
— Backed by MongoDBBlueskyJSONLCatalog
— Backed by a set of newline-delimited JSON files, illustrating “minimal deployment overhead” use case
Intake-Bluesky will also address the use case of reading files not produced
by bluesky, retrofitting the semantics of its data model. Thus, a “bucket
of files” such as a directory of TIFFs could be fed through tools that consume
the representation returned by
read_canonical()
.
Note
These drivers are currently being developed in intake-bluesky itself, but will eventually be split out into separate repositories to isolate dependencies and release cycles. This will be done once the interface with core is deemed stable.