airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Meickle (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-3219) Graph displays are non-deterministic
Date Tue, 16 Oct 2018 19:18:00 GMT
James Meickle created AIRFLOW-3219:
--------------------------------------

             Summary: Graph displays are non-deterministic
                 Key: AIRFLOW-3219
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-3219
             Project: Apache Airflow
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: James Meickle


In Airflow, tasks are stored in a dictionary (self.task_dict). This dictionary is unsorted.
The values in the dictionary - also unsorted - are used for the task list (self.tasks https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L3568).
Therefore, the list of tasks is unsorted. This has a variety of downstream impacts, such as
Airflow's topological sort using this unsorted list to produce a topo-sorted order.

As a consequence of Airflow task list order being based on Python RNG, the returned order
will be reshuffled whenever the server restarts (different seed value). Consequently, Airflow
sorts are not stable across restarts. This is irritating in the case of graph layouts in particular
because a server restart can result in graphs appearing differently even though there has
been no code ship.

We should consider storing tasks in an OrderedDict or some other structure that isn't randomly
sorted.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message