.. _reliability:

Guaranteeing message processing
===============================

You can find more info about Storm reliability features in the `Apache Storm Documentation`_.

Track tuples in your Spout
--------------------------
In order to make Storm track your tuples, you need to pass a *tuple identifier* as ``tup_id`` when emitting each tuple:

.. code-block:: python

   self.emit((value,), tup_id=a_unique_id)

In addition, you should also implement methods :meth:`~pyleus.storm.spout.Spout.ack` and :meth:`~pyleus.storm.spout.Spout.fail`, in order to ack back to your input source whether your tuple has been fully processed or not and to re-emit it, if that is the case.

.. seealso:: For complete API documentation, see :ref:`spout`.

Anchor tuples in your Bolt
--------------------------
At the :class:`~pyleus.storm.bolt.Bolt` level, if you want to extend tracking to newly emitted tuples, you should specify as ``anchors`` a ``list`` containing all the parent tuples for the tuple you are emitting: 

.. code-block:: python

   self.emit((word,), anchors=[parent_tuple])

.. warning:: 

   All tuples need to be acked or failed, independently whether you are using Storm reliability features or not.
   If you are directly using :class:`~.Bolt` instead of :class:`~.SimpleBolt`, you must call this method or your topology will eventually run out of memory or hang.

.. seealso:: For complete API documentation, see :ref:`bolt`.

Tune your topology
------------------
If you are using Storm reliability features, you also need to run **at least one** Storm acker, otherwise your topology will hang. You can specify the number of ackers for your topology in the topology definition YAML file using the ``ackers`` option.

We strongly encourage you to tune the maximum number of tuples that can be pending on a spout task at any given time, too. You can do that using the topology level option ``max_spout_pending``. 

Finally, you may also want to specify the maximum amount of time given to the topology to fully process a message emitted by a spout before failing it. This can be done with option ``message_timeout_secs``. 

.. code-block:: yaml

   name: reliable_topology

   ackers: 3
   max_spout_pending: 100
   message_timeout_secs: 300

.. seealso::

   For a detailed explanation of those settings and for their default values, please take a look at `Apache Storm Config`_, `Apache Storm FAQ`_ and `Apache Storm defaults.yaml`_. 

.. _Apache Storm Documentation: https://storm.apache.org/documentation/Guaranteeing-message-processing.html  
.. _Apache Storm FAQ: https://storm.apache.org/documentation/FAQ.html
.. _Apache Storm Config: https://storm.incubator.apache.org/apidocs/backtype/storm/Config.html
.. _Apache Storm defaults.yaml: https://github.com/apache/storm/blob/master/conf/defaults.yaml