airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Davydov <dan.davy...@airbnb.com.INVALID>
Subject Re: dag file processing times
Date Mon, 24 Apr 2017 20:46:32 GMT
One idea to solve this is to use a daemon that uses inotify to watch for
changes in files and then reprocesses just those files. The hard part is
without any kind of dependency/build system for DAGs it can be hard to tell
which DAGs depend on which files.

On Mon, Apr 24, 2017 at 1:21 PM, Gerard Toonstra <gtoonstra@gmail.com>
wrote:

> Hey,
>
> I've seen some people complain about DAG file processing times. An issue
> was raised about this today:
>
> https://issues.apache.org/jira/browse/AIRFLOW-1139
>
> I attempted to provide a good explanation what's going on. Feel free to
> validate and comment.
>
>
> I'm noticing that the file processor is a bit naive in the way it
> reprocesses DAGs. It doesn't look at the DAG interval for example, so it
> looks like it reprocesses all files continuously in one big batch, even if
> we can determine that the next "schedule"  for all its dags are in the
> future?
>
>
> Wondering if a change in the DagFileProcessingManager could optimize things
> a bit here.
>
> In the part where it gets the simple_dags from a file it's currently
> processing:
>
>                 for simple_dag in processor.result:
>                     simple_dags.append(simple_dag)
>
> the file_path is in the context and the simple_dags should be able to
> provide the next interval date for each dag in the file.
>
> The idea is to add files to a sorted deque by "next_schedule_datetime" (the
> minimum next interval date), so that when we build the list
> "files_paths_to_queue", it can remove files that have dags that we know
> won't have a new dagrun for a while.
>
> One gotcha to resolve after that is to deal with files getting updated with
> new dags or changed dag definitions and renames and different interval
> schedules.
>
> Worth a PR to glance over?
>
> Rgds,
>
> Gerard
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message