airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bence Nagy (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (AIRFLOW-29) Decrease the default `dagbag_import_timeout`
Date Tue, 03 May 2016 12:15:13 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15268601#comment-15268601
] 

Bence Nagy edited comment on AIRFLOW-29 at 5/3/16 12:14 PM:
------------------------------------------------------------

We're using dynamically generated DAGs as well; the most complex one is defining 5 DAGs with
4 tasks each, and I already feel like this makes me enough of a power user that it would be
reasonable to require me to raise this setting from the default in the config manually. But
the import time doesn't come close at all actually, my benchmarks show it at ~15ms (if the
{{airflow}} package is cached already, but it surely is when the scheduler is importing DAGs).

I'd argue that a decrease to 5s wouldn't really accomplish what my goal was here — to prevent
people from wasting resources by obliviously writing operators like the [S3FileTransformOperator
in 1.7.0|https://github.com/airbnb/airflow/blob/1.7.0/airflow/operators/s3_file_transform_operator.py#L60-L61].
I stand by my stance that 1s is a setting that would help a lot more people than it would
(very slighty) inconvenience.


was (Author: underyx):
We're using dynamically generated DAGs as well; the most complex one is defining 5 DAGs with
4 tasks each, and I already feel like this makes me enough of a power user that it would be
reasonable to require me to raise this setting from the default in the config manually. But
the import time doesn't come close at all actually, my benchmarks show it at ~15ms (if the
{{airflow}} package is cached already, but it surely is when the scheduler is importing DAGs).

I'd argue that a decrease to 5s wouldn't really accomplish what my goal was here — to prevent
people from wasting resources by obliviously writing operators like the [S3FileTransformOperator
in 1.7.0|https://github.com/airbnb/airflow/blob/1.7.0/airflow/operators/s3_file_transform_operator.py#L60-L61].
I stand by my stance that 1s is a setting that would help a lot more people than it would
very slighty inconvenience.

> Decrease the default `dagbag_import_timeout`
> --------------------------------------------
>
>                 Key: AIRFLOW-29
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-29
>             Project: Apache Airflow
>          Issue Type: Improvement
>            Reporter: Bence Nagy
>            Priority: Minor
>
> The default setting as of 1.7.0 is:
> {code}
> dagbag_import_timeout = 30
> {code}
> I don't think there's any reason for DAG imports to take over 1 second. I didn't always
know this though, and had a DAG earlier that made DB queries on each run making the scheduler
a lot slower than it should've been. I feel like having a really low default setting would
be a nice way to make sure users don't do silly things like I did, if coupled with helpful
error reporting.
> Original issue: https://github.com/airbnb/airflow/issues/1380



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message