XMM Pipelines

class Pipeline(config=None, name=None)[source]

A task pipeline.

A pipeline will construct and transform data as defined in a pipeline configuration.

The XMM project contains an easily configurable Pipeline runner. It can do almost any task you can think of, given the steps necessary are implemented and compatible.

Pipeline format

Pipelines can be configured using JSON:

 [
     ["FirstStep", {
         "options": "go here",
     }],
     ["SecondStep", {
         "can": [
             "have",
             "different",
             "options"
         ]
     }],
     ["NoOptionStep", {}]
]

Steps

Every pipeline consists of one or more steps. A step is defined as follows:

  1. A function that accepts a state and a context variable. This function must return a tuple with the modified state and context.

  2. A class that may have its __init__ options configured by the JSON pipeline configuration. Such Step classes should inherit from the base PipelineStep.

The state variable should always be in a form of a Python iterable. This way each step can walk over datasets in the state lazily without using up too many system resources.

At the start of every pipeline, the context will be populated with a datetime instance with the current date and time in the start key.

There are currently a number of steps available. See the table below for requirements and configuration options.

PipelineStep

A single step in a pipeline.

ReadStep(load_class[, options])

Read a raw data source from somewhere.

ImportStep(load_class[, options])

Load raw data as a known format.

ValueMapStep(model, attribute_map[, defaults])

Map values in datasets of an import pipeline to correct attributes in our model.

DbWriteStep(model[, create_new, id_field, …])

Write instances to the database or update existing ones.

ExportStep(model[, query, search, object_ids])

Fetch the requested data from the database.

ConvertStep(attributes[, model])

Prepare a list of raw attribute values for all documents.

FlattenStep([depth])

Convert nested dicts to flat dicts.

FormatStep([target, depth, collapse, value_map])

Format data in a flat or nested hierarchical structure.

FileWriteStep(load_class[, options])

Writes the data acquired so far to a file.

RenameStep(template)

Rename an exported file according to some formatting options.

XSLTExportStep([filepath, file])

XSLT step that gets xslt as string for XML transformation.

XSLTImportStep([filepath, file])

XSLT step that gets xslt as string for XML transformation.

ZipStep

Zip the exported file.

TransferStep(load_class[, options])

Transfer the file via e-mail, ftp etc.

NotifyStep([template, template_error])

Send a notification.

ProgressStep([step_size])

Progress step.