airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: Time zone used in "Tree view" and task order
Date Wed, 01 Jun 2016 16:52:43 GMT
About time zones, it'd be nice to add an entry to the FAQ in the docs with
recommendations. We do UTC all around here which makes it easy.

Max

On Tue, May 31, 2016 at 4:22 PM, Jason Chen <chingchien.chen@gmail.com>
wrote:

> Hi Chris,
>
> I see.
> I switched to LocalExecutor and the scheduler is working as expected.
> Thanks a lot for your help!
>
> Jason
>
> On Tue, May 31, 2016 at 3:35 PM, Chris Riccomini <criccomini@apache.org>
> wrote:
>
> > Hey Jason,
> >
> > The SequentialExecutor only ever runs one task at a time. It's meant for
> > debugging purposes. Try switching to the LocalExecutor.
> >
> > Cheers,
> > Chris
> >
> > On Tue, May 31, 2016 at 3:31 PM, Jason Chen <chingchien.chen@gmail.com>
> > wrote:
> >
> >> Chris,
> >>  I am running SequentialExecutor.
> >>
> >> Thanks.
> >> Jason
> >>
> >>
> >> On Tue, May 31, 2016 at 1:36 PM, Chris Riccomini <criccomini@apache.org
> >
> >> wrote:
> >>
> >>> Hey Jason,
> >>>
> >>> Are you running the SerialExecutor? This is the default out-of-the-box
> >>> executor.
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>> On Tue, May 31, 2016 at 12:59 PM, Jason Chen <
> chingchien.chen@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi Chris,
> >>>>
> >>>> I made the changes and tried it out.
> >>>> It seems not working as expected.
> >>>> When a dag is running (a particular task inside that dag is taking
> >>>> time), another task from another dag seems "blocked".
> >>>>
> >>>> My setting:
> >>>> (1) airflow.cfg
> >>>>   max_active_runs_per_dag = 16
> >>>>   parallelism = 32
> >>>>   dag_concurrency = 16
> >>>>
> >>>> (2) A dag (dag1) python file is as below partially. Please note that
> >>>> inside this DAG, the first task (task1) is a long running task
> >>>>
> >>>> dag1 = DAG('dag1', schedule_interval=timedelta(minutes=15),
> >>>> max_active_runs=1, default_args=args)
> >>>>
> >>>> Then, the tasks are running in the order...
> >>>> task1 (long running) --> task 2  --> task3
> >>>> ...
> >>>> (3) In another dag (dag2) python file is as below partially.
> >>>> dag2 = DAG('dag2', schedule_interval=timedelta(minutes=3),
> >>>> max_active_runs=1, default_args=args)
> >>>> ...
> >>>> Then, the tasks are running in the order...
> >>>> taskA (short running task) --> taskB
> >>>>
> >>>> (4) Inside the upstart script file. this is the main part how I start
> >>>> airflow scheduler
> >>>>
> >>>> env SCHEDULER_RUNS=0
> >>>> export SCHEDULER_RUNS
> >>>>
> >>>> script
> >>>>     exec >> ${AIRFLOW_HOME}/scheduler-log/airflow-scheduler.log
2>&1
> >>>>     exec usr/local/bin/airflow scheduler -n ${SCHEDULER_RUNS}
> >>>> end script
> >>>>
> >>>> =========================
> >>>>
> >>>> What I observed are that
> >>>> (a) task1 (of dag1) is running about 20 mins and during it's running
> >>>> time, there is no other dag1 triggered. This is as expected.
> >>>>
> >>>> (b) taskA (of dag2) should be triggered to run every 3 mins. However,
> >>>> it is NOT triggered if task-1 of dag-1 is running.
> >>>> taskA seems to be queued/bolcked and not run. It is executed after
> >>>> task-1 (of dag-1) is done. So, it looks like it is dispatched into a
> "gap"
> >>>> of task1 and task2 (of dag1). This looks not normal, as it's expected
> taskA
> >>>> (of dag 2) should run no matter what happens to another dag (dag-1).
> >>>>
> >>>>
> >>>> Any suggestions?
> >>>> Thanks.
> >>>> Jason
> >>>>
> >>>>
> >>>> On Tue, May 31, 2016 at 9:02 AM, Chris Riccomini <
> criccomini@apache.org
> >>>> > wrote:
> >>>>
> >>>>> Hey Jason,
> >>>>>
> >>>>> The problem is max_active_runs_per_dag=1. Set it back to 16. You
just
> >>>>> need
> >>>>> max_active_runs=1 for the individual DAGs. This will allow multiple
> >>>>> (different) DAGs to run in parallel, but only one DAG of each type
> can
> >>>>> run
> >>>>> at the same type.
> >>>>>
> >>>>> Cheers,
> >>>>> Chris
> >>>>>
> >>>>> On Fri, May 27, 2016 at 11:42 PM, Jason Chen <
> >>>>> chingchien.chen@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> > Hi Chris,
> >>>>> >  Thanks for your reply. After setting it up, I observed how
it
> works
> >>>>> for
> >>>>> > couple of days..
> >>>>> >
> >>>>> >  I tried to to set max_active_runs=1 in the DAG
> >>>>> > dag = DAG(...max_active_runs=1...) and it executed fine to
avoid
> two
> >>>>> runs
> >>>>> > at the same time.
> >>>>> > However, I noticed other dags (not the dag that is running)
is also
> >>>>> > "paused".
> >>>>> > My understanding is that "max_active_runs" is basically
> >>>>> > "max_active_runs_per_dag".
> >>>>> > So, why another dag (different dag name) cannot run at the
same
> time
> >>>>> as the
> >>>>> > first dag?
> >>>>> > I want to have the two dags can be possibly run at the same
time
> and
> >>>>> inside
> >>>>> > each dag, there is only
> >>>>> > one run per dag.
> >>>>> > Thanks.
> >>>>> >
> >>>>> > Jason
> >>>>> >
> >>>>> > My other settings in airflow.cfg
> >>>>> >
> >>>>> > max_active_runs_per_dag=1
> >>>>> > parallelism = 32
> >>>>> > dag_concurrency = 16
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > On Mon, May 16, 2016 at 8:57 PM, Chris Riccomini <
> >>>>> criccomini@apache.org>
> >>>>> > wrote:
> >>>>> >
> >>>>> > > Hey Jason,
> >>>>> > >
> >>>>> > > For (2), by default, task1 will start running again. You'll
have
> >>>>> two runs
> >>>>> > > going at the same time. If you want to prevent this, you
can set
> >>>>> > > max_active_runs to 1 in your DAG.
> >>>>> > >
> >>>>> > > Cheers,
> >>>>> > > Chris
> >>>>> > >
> >>>>> > > On Mon, May 16, 2016 at 1:09 PM, Jason Chen <
> >>>>> chingchien.chen@gmail.com>
> >>>>> > > wrote:
> >>>>> > >
> >>>>> > > > I have two questions
> >>>>> > > >
> >>>>> > > > (1) For the airflow UI: "Tree view", it lists the
tasks along
> >>>>> with the
> >>>>> > > time
> >>>>> > > > highlighted in the top (say, 08:30; 09:00, etc).
What's the
> >>>>> meaning of
> >>>>> > > > time? It looks not the UTC time of the task was running.
 I
> know
> >>>>> in
> >>>>> > > > overall, airflow uses UTC time
> >>>>> > > > (2) I have a DAG with two tasks: task1 --> task2
> >>>>> > > > Task1 is running hourly and could take longer than
one hour to
> >>>>> run,
> >>>>> > > > sometimes.
> >>>>> > > > In such a setup, task1 will be triggered hourly and
what
> happens
> >>>>> if the
> >>>>> > > > previous task1 is still running ? Will the "new"
task1 be
> queued
> >>>>> ?
> >>>>> > > >
> >>>>> > > > Thanks.
> >>>>> > > > Jason
> >>>>> > > >
> >>>>> > >
> >>>>> >
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message