airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: airflow (1.7.0): cpu utilization reaches 70% and above
Date Mon, 13 Jun 2016 15:42:28 GMT
Can you confirm that it's the scheduler process using that CPU?

The SCHEDULER_HEARTBEAT_SEC configuration defines a minimum duration for
scheduling cycles, where the scheduler evaluates all active DagRun and
attempts to kick off task instances whose dependencies are met. Once the
cycle is done, the scheduler should sleep until the next heartbeat, so CPU
should look spiky.

Max

On Mon, Jun 13, 2016 at 8:26 AM, harish singh <harish.singh22@gmail.com>
wrote:

> Yup, I tried changing the scheduler heartbeat to 60 seconds..
> Apart from not getting any update for 60 seconds, What are the side effects
> of changing the two heartbeats? Shouldn't impact performance?
>
> Also, I understand this cpu usage if there are 100s of dags. But with just
> one active dag, doesnt 70% seem high? Esp in my case where there are only
> 10 tasks in the dag making only curls (BashOperators).
>
> Also, a side now, in a different environment where we have 10 dags active,
> the cpu usage stays in the same 70-80% range.
>
> On Mon, Jun 13, 2016, 8:14 AM Maxime Beauchemin <
> maximebeauchemin@gmail.com>
> wrote:
>
> > The scheduler constantly attempts to schedule tasks, interacting with the
> > database and reloading DAG definition. In most larg-ish environments,
> > burning up to a CPU to run the scheduler doesn't seem outrageous to me.
> >
> > If you want to reduce the CPU load related to the scheduler check out
> > SCHEDULER_HEARTBEAT_SEC and MAX_THREADS in the scheduler section of
> > `airflow.cfg`
> >
> > Max
> >
> > On Sun, Jun 12, 2016 at 1:24 PM, harish singh <harish.singh22@gmail.com>
> > wrote:
> >
> > > Hi guys,
> > >
> > > We are running airflow (for about 3 months now) inside a docker
> container
> > > on aws.
> > >
> > > I just did a docker stats to check whats going on. The cpu consumption
> is
> > > huge.
> > > We have around 15 DAGS. Only one DAG is turned ON. the remaining are
> OFF.
> > > The DAG runs with a HOURLY schedule.
> > >
> > > Right now, airflow is consuming almost 1 complete core.
> > > It seems there is some unnecessary spinning?
> > > This doesnt look like the right behavior.
> > > Is there a bug already filed for this? Or am not sure if there is
> > something
> > > incorrect in the way I am using the airflow configuration.
> > >
> > > CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %
> > >       NET I/O               BLOCK I/O
> > > CCC                         68.17%              619.7 MB / 2.147 GB
> > > 28.85%              1.408 GB / 939.4 MB   7.856 MB / 0 B
> > > XXX                          64.36%              619.4 MB / 2.147 GB
> > > 28.84%              1.211 GB / 807.6 MB   7.856 MB / 0 B
> > >
> > >
> > > Ariflow version 1.7.0
> > >
> > > Airflow config:
> > >
> > > sql_alchemy_pool_size = 5
> > > sql_alchemy_pool_recycle = 3600
> > > parallelism = 8
> > > dag_concurrency = 8
> > > max_active_runs_per_dag = 8
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Harish
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message