airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Chen <chingchien.c...@gmail.com>
Subject Re: High load in CPU of MySQL when running airflow
Date Tue, 07 Mar 2017 19:44:53 GMT
I see.
Thanks.

Airflow team,
I noticed a frequently running SQL as below. It's without proper index on
column task_instance.state.
Shouldn't it index "state", given that there could be million of rows in
task_instance?

"SELECT task_instance.task_id AS task_instance_task_id,
task_instance.dag_id AS task_instance_dag_id,.... FROM task_instance WHERE
task_instance.state = 'queued'"


Also, is there a possibility to clean some "unneeded" entries in the tables
(say, task_instance) ?  I mean, for example, removing task states older
than 6 months?

Feedback are welcome.

Thanks.
-Jason



On Tue, Mar 7, 2017 at 10:48 AM, harish singh <harish.singh22@gmail.com>
wrote:

> it does and does not.
> say, scheduler heartbeat = 30 sec
> You will see a spiky cpu consumption graph every 30 seconds.
>
> But we did not go that route and kept the  scheduler heartbeat = 5 sec so
> that we do not lose time when a task is ready to run (I think there is
> another known bug here - tasks dont move from one queued -> running state
> even after "job heartbeat" )
>
>
> On Tue, Mar 7, 2017 at 10:41 AM, Jason Chen <chingchien.chen@gmail.com>
> wrote:
>
> > Hi Harish,
> >  Thanks for the fast response and feedback.
> >  Yeah, I want to see the fix or more discussion !
> >
> > BTW, I assume that, given your 30 dags, airflow runs fine after your
> > increase of heartbeat ?
> > The default is 5 secs.
> >
> >
> > Thanks.
> > Jason
> >
> >
> > On Tue, Mar 7, 2017 at 10:24 AM, harish singh <harish.singh22@gmail.com>
> > wrote:
> >
> > > I had seen a similar behavior, a year ago, when we were are < 5 Dags.
> > Even
> > > then the cpu utilization was reaching 100%.
> > > One way to deal with this is - You could play with "heatbeat" numbers
> > (i.e
> > > increase heartbeat).
> > > But then you are introducing more delay to start jobs that are ready to
> > run
> > > (ready to be queued -> queued -> run)
> > >
> > > Right now, we have more than 30 dags (each with ~ 20-25 tasks) that
> runs
> > > every hour.
> > > We are giving airflow about 5-6 cores (which still seems less for
> > airflow).
> > > Also, for so many tasks every hour,  our mem consumption is over 16G.
> > > All our tasks are basically doing "curl". So 16G seems too high.
> > >
> > > Having said that, I remember reading somewhere that there was a fix
> > coming
> > > for this.
> > > If not, I would definitely want to see more discussion on this.
> > >
> > > Thanks for opening this. I would love to hear on how people are working
> > > around this.
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Mar 7, 2017 at 9:42 AM, Jason Chen <chingchien.chen@gmail.com>
> > > wrote:
> > >
> > > > Hi  team,
> > > >
> > > > We are using airflow v1.7.1.3 and schedule about 50 dags (each dags
> is
> > > > about 10 to one hour intervals). It's with LocalExecutor.
> > > >
> > > > Recently, we noticed the RDS (MySQL 5.6.x with AWS) runs with ~100%
> > CPU.
> > > > I am wondering if airflow scheduler and webserver can cause high CPU
> > load
> > > > of MySQL, given ~50 dags?
> > > > I feel MySQL should be light load..
> > > >
> > > > Thanks.
> > > > -Jason
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message