airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Chen <chingchien.c...@gmail.com>
Subject Re: Time zone used in "Tree view" and task order
Date Tue, 31 May 2016 23:22:37 GMT
Hi Chris,

I see.
I switched to LocalExecutor and the scheduler is working as expected.
Thanks a lot for your help!

Jason

On Tue, May 31, 2016 at 3:35 PM, Chris Riccomini <criccomini@apache.org>
wrote:

> Hey Jason,
>
> The SequentialExecutor only ever runs one task at a time. It's meant for
> debugging purposes. Try switching to the LocalExecutor.
>
> Cheers,
> Chris
>
> On Tue, May 31, 2016 at 3:31 PM, Jason Chen <chingchien.chen@gmail.com>
> wrote:
>
>> Chris,
>>  I am running SequentialExecutor.
>>
>> Thanks.
>> Jason
>>
>>
>> On Tue, May 31, 2016 at 1:36 PM, Chris Riccomini <criccomini@apache.org>
>> wrote:
>>
>>> Hey Jason,
>>>
>>> Are you running the SerialExecutor? This is the default out-of-the-box
>>> executor.
>>>
>>> Cheers,
>>> Chris
>>>
>>> On Tue, May 31, 2016 at 12:59 PM, Jason Chen <chingchien.chen@gmail.com>
>>> wrote:
>>>
>>>> Hi Chris,
>>>>
>>>> I made the changes and tried it out.
>>>> It seems not working as expected.
>>>> When a dag is running (a particular task inside that dag is taking
>>>> time), another task from another dag seems "blocked".
>>>>
>>>> My setting:
>>>> (1) airflow.cfg
>>>>   max_active_runs_per_dag = 16
>>>>   parallelism = 32
>>>>   dag_concurrency = 16
>>>>
>>>> (2) A dag (dag1) python file is as below partially. Please note that
>>>> inside this DAG, the first task (task1) is a long running task
>>>>
>>>> dag1 = DAG('dag1', schedule_interval=timedelta(minutes=15),
>>>> max_active_runs=1, default_args=args)
>>>>
>>>> Then, the tasks are running in the order...
>>>> task1 (long running) --> task 2  --> task3
>>>> ...
>>>> (3) In another dag (dag2) python file is as below partially.
>>>> dag2 = DAG('dag2', schedule_interval=timedelta(minutes=3),
>>>> max_active_runs=1, default_args=args)
>>>> ...
>>>> Then, the tasks are running in the order...
>>>> taskA (short running task) --> taskB
>>>>
>>>> (4) Inside the upstart script file. this is the main part how I start
>>>> airflow scheduler
>>>>
>>>> env SCHEDULER_RUNS=0
>>>> export SCHEDULER_RUNS
>>>>
>>>> script
>>>>     exec >> ${AIRFLOW_HOME}/scheduler-log/airflow-scheduler.log 2>&1
>>>>     exec usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS}
>>>> end script
>>>>
>>>> =========================
>>>>
>>>> What I observed are that
>>>> (a) task1 (of dag1) is running about 20 mins and during it's running
>>>> time, there is no other dag1 triggered. This is as expected.
>>>>
>>>> (b) taskA (of dag2) should be triggered to run every 3 mins. However,
>>>> it is NOT triggered if task-1 of dag-1 is running.
>>>> taskA seems to be queued/bolcked and not run. It is executed after
>>>> task-1 (of dag-1) is done. So, it looks like it is dispatched into a "gap"
>>>> of task1 and task2 (of dag1). This looks not normal, as it's expected taskA
>>>> (of dag 2) should run no matter what happens to another dag (dag-1).
>>>>
>>>>
>>>> Any suggestions?
>>>> Thanks.
>>>> Jason
>>>>
>>>>
>>>> On Tue, May 31, 2016 at 9:02 AM, Chris Riccomini <criccomini@apache.org
>>>> > wrote:
>>>>
>>>>> Hey Jason,
>>>>>
>>>>> The problem is max_active_runs_per_dag=1. Set it back to 16. You just
>>>>> need
>>>>> max_active_runs=1 for the individual DAGs. This will allow multiple
>>>>> (different) DAGs to run in parallel, but only one DAG of each type can
>>>>> run
>>>>> at the same type.
>>>>>
>>>>> Cheers,
>>>>> Chris
>>>>>
>>>>> On Fri, May 27, 2016 at 11:42 PM, Jason Chen <
>>>>> chingchien.chen@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Hi Chris,
>>>>> >  Thanks for your reply. After setting it up, I observed how it works
>>>>> for
>>>>> > couple of days..
>>>>> >
>>>>> >  I tried to to set max_active_runs=1 in the DAG
>>>>> > dag = DAG(...max_active_runs=1...) and it executed fine to avoid
two
>>>>> runs
>>>>> > at the same time.
>>>>> > However, I noticed other dags (not the dag that is running) is also
>>>>> > "paused".
>>>>> > My understanding is that "max_active_runs" is basically
>>>>> > "max_active_runs_per_dag".
>>>>> > So, why another dag (different dag name) cannot run at the same
time
>>>>> as the
>>>>> > first dag?
>>>>> > I want to have the two dags can be possibly run at the same time
and
>>>>> inside
>>>>> > each dag, there is only
>>>>> > one run per dag.
>>>>> > Thanks.
>>>>> >
>>>>> > Jason
>>>>> >
>>>>> > My other settings in airflow.cfg
>>>>> >
>>>>> > max_active_runs_per_dag=1
>>>>> > parallelism = 32
>>>>> > dag_concurrency = 16
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <
>>>>> criccomini@apache.org>
>>>>> > wrote:
>>>>> >
>>>>> > > Hey Jason,
>>>>> > >
>>>>> > > For (2), by default, task1 will start running again. You'll
have
>>>>> two runs
>>>>> > > going at the same time. If you want to prevent this, you can
set
>>>>> > > max_active_runs to 1 in your DAG.
>>>>> > >
>>>>> > > Cheers,
>>>>> > > Chris
>>>>> > >
>>>>> > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <
>>>>> chingchien.chen@gmail.com>
>>>>> > > wrote:
>>>>> > >
>>>>> > > > I have two questions
>>>>> > > >
>>>>> > > > (1) For the airflow UI: "Tree view", it lists the tasks
along
>>>>> with the
>>>>> > > time
>>>>> > > > highlighted in the top (say, 08:30; 09:00, etc). What's
the
>>>>> meaning of
>>>>> > > > time? It looks not the UTC time of the task was running.
 I know
>>>>> in
>>>>> > > > overall, airflow uses UTC time
>>>>> > > > (2) I have a DAG with two tasks: task1 --> task2
>>>>> > > > Task1 is running hourly and could take longer than one
hour to
>>>>> run,
>>>>> > > > sometimes.
>>>>> > > > In such a setup, task1 will be triggered hourly and what
happens
>>>>> if the
>>>>> > > > previous task1 is still running ? Will the "new" task1
be queued
>>>>> ?
>>>>> > > >
>>>>> > > > Thanks.
>>>>> > > > Jason
>>>>> > > >
>>>>> > >
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message