airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Guziel <alex.guz...@airbnb.com.INVALID>
Subject Re: Ignore Processing DAG Definition Python Files for Paused DAGs
Date Tue, 28 Nov 2017 00:41:05 GMT
Hmm, this may not apply to your implementation, but it sounds like for this
it would not handle cases like:

1) a.py has dag A1 and A2, A1 is paused, A2 is not
2) b.py has dag B1, which is paused. Later B2 is added to b.py but does not
get picked up since B1 is paused.

On Mon, Nov 27, 2017 at 3:29 PM, Andy Huynh <ahuynh@symphonyrm.com> wrote:

> When we updated to Airflow 1.9, we noticed that there was a pretty big
> delay between tasks (somewhere between 2-4 minutes, even after playing
> around with the min_file_process_interval and max_threads configs). Our
> thought was that if we reduce the number of files that the scheduler has to
> process, then the scheduler would process files for unpaused DAGs more
> frequently, reducing the delay between tasks.
>
> On 2017-11-27 11:23, Alek Storm <alek.storm@gmail.com> wrote:
> > What's the advantage of this change? Performance?
> >
> > Alek
> >
> > On Mon, Nov 27, 2017 at 1:11 PM, ahuynh@symphonyrm.com <
> > ahuynh@symphonyrm.com> wrote:
> >
> > > Hi all,
> > >
> > > I wanted to gauge community interest in this idea we have. We are
> > > currently running a modified version of Airflow 1.9 RC3 where we ignore
> > > processing DAG definition Python files for paused DAGs. By default,
> > > list_py_file_paths traverses the dags subdirectory to look for Python
> > > files, and the scheduler processes all these files, regardless of
> whether
> > > the DAGs defined in these files are paused or not. Our proposed
> > > modification was to query the fileloc column in the dag table,
> filtering
> > > on is_paused=1 and is_active=1 to get a list of file paths for paused
> DAGs.
> > > Then, we can exclude these files from the known_file_paths, so that the
> > > scheduler does not process these files. This feature can be set on and
> off
> > > via a scheduler config variable.
> > >
> > > If anyone is interested, we already have the code written, so we'd be
> > > happy to package up our changes and create a PR.
> > >
> > > Thanks!
> > > -Andy
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message