airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxime Beauchemin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-97) "airflow" "DAG" strings in file necessary to import dag
Date Wed, 12 Apr 2017 16:05:41 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-97?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15966122#comment-15966122
] 

Maxime Beauchemin commented on AIRFLOW-97:
------------------------------------------

The rational for it was that people may and will dump python files that do significant work
outside of some `if __name__ == '__main__':` block and that Airflow, as it crawls and imports
these modules, will trigger them. We've also seen people dumping entire libs in our pipelines
folder, and the DagBag parsing process will import the living hell out of it. 

This is a naive attempt at jumping over files that don't look like an Airflow pipeline by
introspecting the code without evaluating it. 

It may have been introduced after the module parsing timeout rule was introduced. Note that
the DagBag timeout logic may be preferable in some ways, but that it has limitations. First
it won't work under LocalExecutor for some reason I won't get into here. Second it sucks to
pay the timeout price at every scheduler cycle. Perhaps a better approach would be for the
process to add timeout scripts to a blacklist and surface it in the UI. Then users would have
to re-enable bad actors manually.

> "airflow" "DAG" strings in file necessary to import dag
> -------------------------------------------------------
>
>                 Key: AIRFLOW-97
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-97
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: Airflow 1.7.0
>            Reporter: Etiene Dalcol
>            Priority: Minor
>
> Hello airflow team! Thanks for the awesome tool!
> We made a small module to automate our DAG building process and we are using this module
on our DAG definition. Our airflow version is 1.7.0.
> However, airflow will not import this file because it doesn't have the words DAG and
airflow on it. (The imports etc are done inside our little module). Apparently there's a safe_mode
that skips files without these strings.
> (https://github.com/apache/incubator-airflow/blob/1.7.0/airflow/models.py#L197)
> This safe_mode is default to True but is not passed to the process_file function, so
it is always True and there's no apparent way to disable it.
> (https://github.com/apache/incubator-airflow/blob/1.7.0/airflow/models.py#L177)
> (https://github.com/apache/incubator-airflow/blob/1.7.0/airflow/models.py#L313)
> Putting this comment on the top of the file makes it work for the moment and brought
me a good laugh today 👯 
> #DAG airflow —> DO NOT REMOVE. the world will explode



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message