Topology definition YAML syntax

Options whose default is not specified on this page use the same default as the corresponding Apache Storm configuration option.

Topology level options

  • name(str)[mandatory]

    Name assigned to your topology.

  • topology(seq)[mandatory]

    Sequence containing all components of the topology, where each component is a map. Allowed components: spout, bolt.

  • workers(int)

    Number of workers to be spawned.

  • ackers(int)

    Number of executors for ackers to be spawned. Corresponds to Storm TOPOLOGY_ACKER_EXECUTORS.

  • max_spout_pending(int)

    Maximum number of tuples that can be pending on a spout task at any given time.

  • message_timeout_secs(int)

    Maximum amount of time given to the topology to fully process a message emitted by a spout.

  • max_shellbot_pending(int)

    Maximum pending tuples in one ShellBolt. Default: 1

  • logging_conf(str)

    Path to logging configuration file. Default: <yaml_file_dir>/pyleus_logging.conf. Specify none if a file corresponds to the default path, but you want to ignore it.

  • requirements_filename(str)

    Path to the file listing topology requirements. Default: <yaml_file_dir>/requirements.txt. Specify none if a file corresponds to the default path, but you want to ignore it.

  • python_interpreter(str)

    The Python interpreter to use to create the topology virtualenv (exposes virtualenv --python option). Default: the interpreter that virtualenv was installed with (/usr/bin/python).

  • serializer(str)

    Serializer used by Pyleus for Stom multilang messages. Allowed: msgpack, json. Default: msgpack.

    Note

    If you want to use JSON as encoding format Storm multilang messages, you can switch between Python standard library json module and simplejson module specifying simplejson in the requirements for your topology.

    Tip

    If you are on Python 2.6, we strongly recommend simplejson over json for better performance.

Component level options

These options belong to the block associated either with a spout or a bolt component.

  • name(str)[mandatory]

    Name assigned to the component.

  • module(str)[mandatory]

    Python module containing the code for that component (e.g. my_topology.my_spout). Every valid module should contain a class inheriting either from Spout or from Bolt. The module should also call the component run() method when __name__ == '__main__'.

  • type(str)

    Ad-hoc option to be used instead of module to specify the Storm Kafka Spout component. Allowed: kafka, python.

    Note

    Only inside a spout block, you can specify type: kafka instead of module.

    See also

    Refer to this example for all kafka related options.

  • parallelism_hint(int)

    Initial number of executors per component.

  • tasks(int)

    Number of tasks per component.

  • tick_freq_secs(float)[only for bolt]

    Interval in seconds between two consecutive tick tuples.

  • options(map)

    Block containing options to be passed to the component.

  • groupings(seq)[mandatory only for bolt]

    Sequence of groupings specifying the input streams for the component.

    See also

    For grouping specific syntax, please refer to Output streams and groupings.