airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarosław Bojar (JIRA) <j...@apache.org>
Subject [jira] [Created] (AIRFLOW-2198) Heuristic in dag_processing list_py_file_paths sometimes ignores files containing DAG definitions
Date Thu, 08 Mar 2018 12:06:00 GMT
Jarosław Bojar created AIRFLOW-2198:
---------------------------------------

             Summary: Heuristic in dag_processing list_py_file_paths sometimes ignores files
containing DAG definitions
                 Key: AIRFLOW-2198
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2198
             Project: Apache Airflow
          Issue Type: Bug
    Affects Versions: 1.8.2
            Reporter: Jarosław Bojar


In function list_py_file_paths in dag_processing module there is a heuristic checking if file
contains worda 'airflow' and 'DAG'. If file does not contain both words it is ignored from
further processing:
{code:java}
# Heuristic that guesses whether a Python file contains an
# Airflow DAG definition.
might_contain_dag = True
if safe_mode and not zipfile.is_zipfile(file_path):
    with open(file_path, 'rb') as f:
        content = f.read()
        might_contain_dag = all(
            [s in content for s in (b'DAG', b'airflow')])

if not might_contain_dag:
    continue
{code}
If DAG instantiation is in different file than dag definition (for example dag definition
may be in some factory method), file instantiating DAG is ignored by this heuristic, and DAG
is not processed.

For example:

dag_factory.py:
{code:java}
from airflow import DAG

def create_dag(dag_id, other_params...):
  ...
  return DAG(dag_id, ...){code}
dag_instantiation.py
{code:java}
from dag_factory import create_dag

first_dag = create_dag('first', other_params...)
second_dag = create_dag('second', other_params...){code}
In this case file dag_factory.py is processed but it does not contain dag instantiation and
file dag_instantiation.py is ignored by heuristic. Consequently dags are not created.

 

Function list_py_file_paths has a parameter safe_mode which may be used to turn off this
heuristic, but it is never used when this function is called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message