airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: High load in CPU of MySQL when running airflow
Date Tue, 07 Mar 2017 19:45:57 GMT
Hi Jason

I think you need to back it up with more numbers. You assume that a load of 100% is bad and
also that 16GB of mem is a lot.

30x25 = 750 tasks per hour = 12,5 tasks per minute. For every task we launch a couple of processes
(at least 2) that do not share memory, this is to ensure tasks cannot hurt each other. Curl
tasks are probably launched by using a BashOperator, which means another process. Curl is
itself another process. So 4 processes per task, that cannot share memory. Curl can cache
memory itself as well. You probably have peak times and longer running tasks so it is not
evenly spread, then it starts adding up quickly?

Bolke.


> On 7 Mar 2017, at 19:41, Jason Chen <chingchien.chen@gmail.com> wrote:
> 
> Hi Harish,
> Thanks for the fast response and feedback.
> Yeah, I want to see the fix or more discussion !
> 
> BTW, I assume that, given your 30 dags, airflow runs fine after your
> increase of heartbeat ?
> The default is 5 secs.
> 
> 
> Thanks.
> Jason
> 
> 
> On Tue, Mar 7, 2017 at 10:24 AM, harish singh <harish.singh22@gmail.com>
> wrote:
> 
>> I had seen a similar behavior, a year ago, when we were are < 5 Dags. Even
>> then the cpu utilization was reaching 100%.
>> One way to deal with this is - You could play with "heatbeat" numbers (i.e
>> increase heartbeat).
>> But then you are introducing more delay to start jobs that are ready to run
>> (ready to be queued -> queued -> run)
>> 
>> Right now, we have more than 30 dags (each with ~ 20-25 tasks) that runs
>> every hour.
>> We are giving airflow about 5-6 cores (which still seems less for airflow).
>> Also, for so many tasks every hour,  our mem consumption is over 16G.
>> All our tasks are basically doing "curl". So 16G seems too high.
>> 
>> Having said that, I remember reading somewhere that there was a fix coming
>> for this.
>> If not, I would definitely want to see more discussion on this.
>> 
>> Thanks for opening this. I would love to hear on how people are working
>> around this.
>> 
>> 
>> 
>> 
>> 
>> On Tue, Mar 7, 2017 at 9:42 AM, Jason Chen <chingchien.chen@gmail.com>
>> wrote:
>> 
>>> Hi  team,
>>> 
>>> We are using airflow v1.7.1.3 and schedule about 50 dags (each dags is
>>> about 10 to one hour intervals). It's with LocalExecutor.
>>> 
>>> Recently, we noticed the RDS (MySQL 5.6.x with AWS) runs with ~100% CPU.
>>> I am wondering if airflow scheduler and webserver can cause high CPU load
>>> of MySQL, given ~50 dags?
>>> I feel MySQL should be light load..
>>> 
>>> Thanks.
>>> -Jason
>>> 
>> 


Mime
View raw message