airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Daly <>
Subject Airflow Latency Between Task Scheduling and task Execution
Date Mon, 03 Oct 2016 17:50:45 GMT
I posted this on StackOverflow but kind-of need a response “real soon now” …

I'm sure this is a newbie question so apologies in advance. Running Airflow 1.7.x on a virtual
Ubuntu 16.x machine with a MacBookPro host. Resources are not an issue. What I need to know
is how to manipulate the scheduler latency: I have 2 tasks and task 2 follows task 1, nice
and simple. With just this DAG, I noticed that task 2 would typically run 15s after task 1
completes and I'm wondering if I can get that much lower? I have re-configured to use a CeleryExecutor
with 1 worker node and changed the job_heartbeat_sec and scheduler_heartbeat_sec to 1 (each).
These are integers so I can't express sub-second scheduling. Now my task 2 will run ~3s after
task 1 but I'd still like to get it lower, preferably sub-second. The wiki pages suggests
that the scheduler can take 0.05-015s which, if that is not a typo, suggests sub-second task
scheduling is possible. I can run this airflow invocation on a dedicated machine, if I have
to, so that nothing else is interfering with it.

So, am I pushing airflow too hard? Or can I get task 2 to run pretty much as soon as task
1 has finished? If so, how?

I have added a PS for a bit more contextual information if you need it. Thanks in advance,

Phil Daly<>

A little bit more information on what I am trying to do (the OCS = observatory / observation
control system):

OCS could use a workflow engine for observation sequencing (and other tasks) with the following
set up:

  *   a firm real-time queue for executing night-time observations for each telescope. In
this sense, we would like a workflow and the tasks it contains to be scheduled and executed
within a very short amount of time, say, < 1s. How much latency we can adopt here is open
to question;
  *   a soft real-time queue for executing day-time calibrations and engineering functions
for each telescope. In this sense, the workflow and the tasks it contains should start promptly
but we can accept scheduling delay between, say, 1–3s and perhaps longer;
  *   a regular queue for cron-like jobs for each telescope (end of night reports etc). In
this sense, we leave it to Airflow to determine the scheduling and accept that these jobs
might not start for up to 30s after they become runnable.

Clearly Airflow can handle the second 2 use cases but I really need to know if I can make
it fly for the first (firm real-time queue)

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message