Export steps

Steps required to export data to various destinations.

ExportStep

class ExportStep(model, query=None, search=None, object_ids=None)[source]

Fetch the requested data from the database.

This step will query the database and pass on the raw returned query results.

__init__(model, query=None, search=None, object_ids=None)[source]

Export data directly from MongoDB.

Either with a mongoengine query or via a ElasticSearch result.

Parameters
  • model (str) – The model name to act upon.

  • query (dict) – A mongoengine query dict to perform.

  • search (dict) – A query to be performed.

In an export pipeline, the first step will usually be the ExportStep itself. This step gathers data from the database to be converted and formatted further at a later point in the pipeline.

You can either supply a query argument containing a MongoEngine query or a search argument with options used to construct an ElasticSearch query.

Arguments:

  • str model: A model name or identifier, as used in the generic RESTful API endoints.

  • list query: A list of actions to perform on the model class directly as queries. Example:

    "query": [
        ["objects", {
            "name__startswith": "A",
        }],
        ["order_by", ["last_login"]]
    ]
    
  • dict search: An ElasticSearch query object

Added context:

  • str mode: Will be set to 'export'

  • dict export
    • type model: Will be set to the model class

    • str model_name: Will be set to the model name (str)

ConvertStep

class ConvertStep(attributes, model=None)[source]

Prepare a list of raw attribute values for all documents.

__init__(attributes, model=None)[source]

Prepare this step.

ConvertStep receives a list of attributes to be formatted for export.

Parameters

attributes (list[str]) – List of dot-notation attributes.

The converter step is responsible for transforming the acquired data to a more appropriate format, only containing fields we actually requested.

Arguments:

  • list attributes: A list of attributes to extract. Can be a dotted-path to fetch

    values from a related model field. E.g. ['foo', 'user.name']

Added context:

  • dict convert
    • list attributes: Will be a copy of the supplied arguments list.

FormatStep

class FormatStep(target=None, depth=None, collapse=False, value_map=None, **options)[source]

Format data in a flat or nested hierarchical structure.

TODO: Infinite recursion detection and prevention:

To prevent exporting recursive documents, there must be a filter present for the attribute that points to the same class of documents.

Example: Exporting page with attribute parent must set a filter in the likes of parent=None (which only gets applied to the root documents being exported).

__init__(target=None, depth=None, collapse=False, value_map=None, **options)[source]

Configure data conversion for this step.

Parameters
  • target (str) – Currently supported is ‘text’ and ‘json’ Defaults to text, which will convert everything to str before writing.

  • depth (int) – Depth zero will yield a flat list of values, otherwise will export selected attributes up to the specified depth. Set depth to non-positive value for no restriction!

  • collapse (bool) – If True, empty nested trees will just collapse to a single NULL. WARNING: This will potentially remove keys from your output! Do not use if you rely on all columns defined to be present (albeit NULL).

  • value_map (dict) – Direct value mapping, e.g. None to ‘NULL’.

  • options (dict) – Additional options that will be passed to datatype converters.

Formatting options can be supplied to this step, so data will look just like you requested.

Arguments:

  • target, depth, collapse, value_map, **options

  • str target: (optional) A string identifying a known text format to export to, possible values are:
    • 'text' This will coerce all data with str() before writing

    • 'json' The JSON format will leave some types alone, so they can be exported with their appropriate type still intact in a JSON file.

  • int depth: (optional) Define how deep nested attributes will be parsed,

    0 means no nesting, -1 will be unlimited.

  • bool collapse: If the collapse flag is set and a relationship in an object

    resolves to None, the complete dict will be set to None instead of their individual sub-properties (has no effect when depth=0).

  • dict value_map: (optional) A dict or list of pairs containing replacement values for

    specific other values found in attributes. Use this to show ‘YES’ for values like True. Example:

    "value_map": [
        [null, "[NULL]"],
        [true, "[TRUE]"],
        [false, "[FALSE]"]
    ],
    
  • **options: Further options can be supplied, they will be passed down to the individual datatype

    formatters. Currently supported are for example:

    • dateformat

    • datetimeformat

    • timeformat

Added context:

  • dict format
    • int depth: Depth of data structures

    • bool flat: Will be true if depth is 0

FileWriteStep

class FileWriteStep[source]
class FileWriteStep(load_class, options=None)[source]

Writes the data acquired so far to a file.

__init__(load_class, options=None)

Load the configurable class and configure it.

This step will write the current state to a file using a writer of your choice.

See also

Writer classes

Arguments:

  • str load_class: The name of a writer class to use. Can also be

    a fully qualified import path.

  • dict options: Options to be passed down to the writer class
    • str filename: Filename to save to (relative path)

Added context:

  • dict write:
    • Writer writer: The writer class that was used

    • str filename: The absolute path of the file that was written to

    • str mime_type: MIME type of the written file

RenameStep

class RenameStep(template)[source]

Rename an exported file according to some formatting options.

This step can usually be applied directly after the FileWriteStep and it will rename the written file according to a template.

__init__(template)[source]

Set a template string to use as a new filename.

Parameters

template (str) – The filename template.

The template can contain any strftime format identifier as well as these special placeholders:

  • {model}: Model name

  • {model_lower}: Model name (lowercase)

  • {model_upper}: Model name (uppercase)

  • {model_key}: Model key (lowercase)

Added context:

This step adds no context.

ZipStep

class ZipStep[source]

Zip the exported file.

TODO: The original file will not be deleted in the export directory.

__init__

Initialize self. See help(type(self)) for accurate signature.

This step will zip the file found at the current state.

Arguments:

This step takes no arguments.

Added context:

  • dict zip:
    • str filename: The new filename for the zipped file.

XSLTExportStep

class XSLTExportStep(filepath=None, file=None)[source]

XSLT step that gets xslt as string for XML transformation.

__init__(filepath=None, file=None)[source]

Initialize self. See help(type(self)) for accurate signature.

This step gets a XSLT template as string and uses it for XML transformation.

Arguments:

  • str xslt: XSLT transformation template.

Added context:

This step adds no context.

FilterStep

class FilterStep(filter_func)[source]

Filter rows.

__init__(filter_func)[source]

Initialize self. See help(type(self)) for accurate signature.

This step filters documents based on a filter function given to it.

Arguments:

  • func filter_func: Gets the function that is used to filter documents.

Added context:

This step adds no context.