*********************** Write Your Own Suitcase *********************** Scope of a Suitcase =================== Suitcases are Highly Specific ----------------------------- Suitcase translates documents generated by bluesky (or anything that follows its "event model" schema) into file formats. Suitcase's design philosophy is to make many well-tailored suitcases rather than try to fit a large range of functionality into one suitcase. Each file format is implemented in a separate Python package, named ``suitcase-``. As support for new formats is added over time, there may someday be hundreds of suitcase packages. This modular approach will keep the number of dependencies manageable (no need to install heavy I/O libraries that you don't plan to use). It will also allow each suitcase to be updated and released on its own schedule and maintained by the specific communities, facilities, or users who care about a particular format. Even "one suitcase per file format" is too broad. Some formats, such as HDF5, enable a huge variety of layouts---too many configure via a reasonable number of parameters. Therefore, there will never be a "suitcase-hdf5" package, but rather multiple suitcases, each tuned a specific HDF5 layout such as NeXuS or Data Exchange. Categories of Suitcases ----------------------- The :doc:`list of existing and planned suitcases ` groups them into three categories. * "One-offs" --- These are tailed to one specific application, writing files to the requirements of a particular software program or user. * "Generics" --- These write commonly-requested formats such as TIFF or CSV. There is often room for interpretation in how exactly to lay out the data into a given file format. (One TIFF file per detector? Per Event? Per exposure?) The design process can devolve into tricky judgment calls or a confusing array of options for the user. When it doubt, we encourage you to steer toward writing one or more "one-offs". * "Backends" --- These are less user-facing that the other two categories. They write into a file meant to be read back be a programmatic interface. For example, suitcase-mongo insert documents into MongoDB. Creating a New Suitcase Package =============================== Create the package with cookiecutter ------------------------------------ #. Install cookiecutter. This is a tool for generating a new Python package from a template. .. code-block:: bash pip install --upgrade cookiecutter #. Use cookiecutter to create a new suitcase package. Just follow the prompts. .. code-block:: bash cookiecutter https://github.com/NSLS-II/suitcase-cookiecutter subproject_name [ex: tiff, spec, pizza-box]: my-special-format subpackage_name [my_special_format]: This will have created a new directory named ``suitcase-my-special-format`` with all the "scaffolding" of a working Python package for suitcase. #. Initialize the directory as a git repository. .. code-block:: bash cd suitcase-my-special-format git init git add . git commit -m "Initial commit" #. Install the package and its development requirements. .. code-block:: bash pip install -e . pip install -r requirements-dev.txt Write the Serializer -------------------- Before reading this section, read to the end of :doc:`usage`. All suitcase packages must contain a :class:`Serializer` class with the interface outlined below. It should also contain an :func:`export` function. These should be in ``suitcase/my-special-format/__init__.py``. Here is a sketch of a :class:`Serializer` .. code-block:: python import event_model from pathlib import Path import suitcase.utils class Serializer(event_model.DocumentRouter): def __init__(self, directory, file_prefix='{uid}', **kwargs): self._file_prefix = file_prefix self._kwargs = kwargs self._templated_file_prefix = '' # set when we get a 'start' document if isinstance(directory, (str, Path)): # The user has given us a filepath; they want files. # Set up a MultiFileManager for them. self._manager = suitcase.utils.MultiFileManager(directory) else: # The user has given us their own Manager instance. Use that. self._manager = directory # Finally, we usually need some state related to stashing file # handles/buffers. For a Serializer that only needs *one* file # this may be: # # self._output_file = None # # For a Serializer that writes a separate file per stream: # # self._files = {} @property def artifacts(self): # The 'artifacts' are the manager's way to exposing to the user a # way to get at the resources that were created. For # `MultiFileManager`, the artifacts are filenames. For # `MemoryBuffersManager`, the artifacts are the buffer objects # themselves. The Serializer, in turn, exposes that to the user here. # # This must be a property, not a plain attribute, because the # manager's `artifacts` attribute is also a property, and we must # access it anew each time to be sure to get the latest contents. return self._manager.artifacts def close(self): self._manager.close() # These methods enable the Serializer to be used as a context manager: # # with Serializer(...) as serializer: # ... # # which always calls close() on exit from the with block. def __enter__(self): return self def __exit__(self, *exception_details): self.close() # Each of the methods below corresponds to a document type. As # documents flow in through Serializer.__call__, the DocumentRouter base # class will forward them to the method with the name corresponding to # the document's type: RunStart documents go to the 'start' method, # etc. # # In each of these methods: # # - If needed, obtain a new file/buffer from the manager and stash it # on instance state (self._files, etc.) if you will need it again # later. Example: # # filename = f'{self._templated_file_prefix}-primary.csv' # file = self._manager.open('stream_data', filename, 'xt') # self._files['primary'] = file # # See the manager documentation below for more about the arguments to open(). # # - Write data into the file, usually something like: # # content = my_function(doc) # file.write(content) # # or # # my_function(doc, file) def start(self, doc): # Fill in the file_prefix with the contents of the RunStart document. # As in, '{uid}' -> 'c1790369-e4b2-46c7-a294-7abfa239691a' # or 'my-data-from-{plan-name}' -> 'my-data-from-scan' self._templated_file_prefix = self._file_prefix.format(**doc) ... def descriptor(self, doc): ... def event_page(self, doc): # There are other representations of Event data -- 'event' and # 'bulk_events' (deprecated). But that does not concern us because # DocumentRouter will convert this representations to 'event_page' # then route them through here. ... def stop(self, doc): ... See the API Documentation below for more information about :class:`~event_model.DocumentRouter` and :class:`~suitcase.utils.MultiFileManager`. Any of the existing suitcases may be useful as a reference. We recommend these in particular: * `suitcase-csv `_ is a good introductory example. * `suitcase-jsonl `_ generates a straightforward, single-file format. * `suitcase-tiff `_ generates many separate binary files. .. note:: Why not put the boilerplate code above into a base class, like ``BaseSerializer`` and use inheritance? The amount of boilerplate is not large, and it may be easier to simply copy it than to cross-reference between a subclass and a base class. Additionally, the details can vary enough from one :class:`Serializer` that inheritence tends to get messy. Add an export function ---------------------- This is just a simple wrapper around the :class:`Serializer`. It takes a generator of ``(name, doc)`` pairs and pushes them through the :class:`Serializer`. .. code-block:: python def export(gen, directory, file_prefix='{uid}-', **kwargs): with Serializer(directory, file_prefix, **kwargs) as serializer: for item in gen: serializer(*item) return serializer.artifacts Test the Serializer ------------------- The suitcase-utils package provides a parametrized pytest fixture, ``example_data`` for generating test data. Tests should go in ``suitcase/my-special-format/tests/tests.py``. .. code-block:: python import json from suitcase.my_special_format import export, NumpyEncoder def test_export(tmp_path, example_data): # Exercise the exporter on the myriad cases parametrized in example_data. documents = example_data() artifacts = export(documents, tmp_path) # For extra credit, read back the data # and check that it looks right. Run the tests with pytest: .. code-block:: bash pytest API Documentation ================= The :class:`DocumentRouter` is typically useful as base class for a :class:`Serializer`. .. autoclass:: event_model.DocumentRouter There are "manager" classes for files and memory buffers. The user may provide their own manager class implementing a different transport mechanism. It need only implement these same methods. .. autoclass:: suitcase.utils.MultiFileManager :members: .. autoclass:: suitcase.utils.MemoryBuffersManager :members: These classes are used by the :class:`~suitcase.utils.MemoryBuffersManager`. .. autoclass:: suitcase.utils.PersistentStringIO :members: .. autoclass:: suitcase.utils.PersistentBytesIO :members: