airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: airflow (1.7.0): cpu utilization reaches 70% and above
Date Mon, 13 Jun 2016 18:16:03 GMT
This is in the works but indeed will require some time to stabilize it. I have a test brand
for it. 

Sent from my iPhone

> On 13 jun. 2016, at 19:39, Ben Tallman <ben@apigee.com> wrote:
> 
> In the past, I have written/seen systems where the pattern us that a task
> runner/worker is in charge of handling scheduling of the next tasks that
> need to run on completion of a task and the Scheduler only handles "issues"
> and initial kickoffs...
> 
> Opens another can of worms, but I think I've seen discussion on here of
> that idea and I thought I'd put my vote in for that pattern.
> 
> On Mon, Jun 13, 2016 at 10:35 AM harish singh <harish.singh22@gmail.com>
> wrote:
> 
>> So I changed the scheduler heartbeat to 60 sec
>> 
>> [scheduler]
>> job_heartbeat_sec = 5
>> scheduler_heartbeat_sec = 60
>> 
>> As expected, we did see a spiky cpu utilization.
>> 
>> 
>> But I see a different problem (not so much of a problem, but just
>> putting it here so that it may help someone who may need to do
>> something similar)
>> 
>> Suppose I have a DAG with Tasks as below:A-->B-->C
>> 
>> If A starts and completes its execution in 10 secs.
>> 
>> The next task B cannot start until the next scheduler heartbeat. That
>> means, I may have to wait roughly around 60 seconds in the worst case.
>> 
>> So one option is to have a heartbeat of around 30 seconds
>> (or duration of whatever you believe is your least time consuming
>> task). This would just be a rough optimization to make sure we make
>> progress soon enough after the end of a task.
>> 
>> 
>> Thanks,Harish
>> 
>> 
>> On Mon, Jun 13, 2016 at 9:50 AM, harish singh <harish.singh22@gmail.com>
>> wrote:
>> 
>>> yup. it is the scheduler process using cpu:
>>> 
>>> 
>>> The below is the usage with default out-of-box heartbeat settings.
>>> I will paste some numbers in something, as I play with airflow.cfg.
>>> This is the output of `top`
>>>  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
>> COMMAND
>>> *   65 airflow   20   0  445488 112648  13308 R  61.3  0.7 834:21.23
>>> airflow*
>>>    1 airflow   20   0   17968   1912   1668 S   0.0  0.0   0:00.04
>>> entrypoint.sh
>>>   64 airflow   20   0  407972  74876  13328 S   0.0  0.5   0:01.12
>> airflow
>>>  154 airflow   20   0  404404  62528   4344 S   0.0  0.4   0:00.11
>> airflow
>>>  155 airflow   20   0  404404  62532   4344 S   0.0  0.4   0:00.11
>> airflow
>>>  156 airflow   20   0  404404  62532   4344 S   0.0  0.4   0:00.12
>> airflow
>>>  157 airflow   20   0  404404  62532   4344 S   0.0  0.4   0:00.11
>> airflow
>>>  158 airflow   20   0  404404  62532   4344 S   0.0  0.4   0:00.11
>> airflow
>>>  159 airflow   20   0  404404  62532   4344 S   0.0  0.4   0:00.11
>> airflow
>>>  160 airflow   20   0  404404  62488   4300 S   0.0  0.4   0:00.12
>> airflow
>>>  161 airflow   20   0  404404  62468   4280 S   0.0  0.4   0:00.11
>> airflow
>>>  163 airflow   20   0   50356  16680   5956 S   0.0  0.1   0:08.86
>>> gunicorn: maste
>>>  168 airflow   20   0  457204 119940  13080 S   0.0  0.8   0:18.86
>>> gunicorn: worke
>>>  170 airflow   20   0  463168 126028  13080 S   0.0  0.8   0:13.81
>>> gunicorn: worke
>>>  171 airflow   20   0  464936 127672  13080 S   0.0  0.8   0:08.53
>>> gunicorn: worke
>>>  174 airflow   20   0  467460 130240  13080 S   0.0  0.8   0:08.52
>>> gunicorn: worke
>>> 7627 airflow   20   0   18208   3192   2672 S   0.0  0.0   0:00.04 bash
>>> 8091 airflow   20   0   30312   7556   4784 S   0.0  0.0   0:00.00
>> python
>>> 28808 airflow   20   0   18208   3352   2836 S   0.0  0.0   0:00.08 bash
>>> 28822 airflow   20   0   19844   2340   2020 R   0.0  0.0   0:00.01 top
>>> 
>>> 
>>> *PID 65 is the scheduler.*
>>> 
>>> 
>>> airflow@80be4e775e55:~$ ps -ef
>>> UID        PID  PPID  C STIME TTY          TIME CMD
>>> airflow      1     0  0 Jun12 ?        00:00:00 /bin/bash ./entrypoint.sh
>>> airflow     64     1  0 Jun12 ?        00:00:01 /usr/bin/python
>>> /usr/local/bin/airflow webserver -p 8080
>>> airflow     65     1 62 Jun12 ?        13:54:22 /usr/bin/python
>>> /usr/local/bin/airflow scheduler
>>> airflow    154    65  0 Jun12 ?        00:00:00 /usr/bin/python
>>> /usr/local/bin/airflow scheduler
>>> airflow    155    65  0 Jun12 ?        00:00:00 /usr/bin/python
>>> /usr/local/bin/airflow scheduler
>>> airflow    156    65  0 Jun12 ?        00:00:00 /usr/bin/python
>>> /usr/local/bin/airflow scheduler
>>> airflow    157    65  0 Jun12 ?        00:00:00 /usr/bin/python
>>> /usr/local/bin/airflow scheduler
>>> airflow    158    65  0 Jun12 ?        00:00:00 /usr/bin/python
>>> /usr/local/bin/airflow scheduler
>>> airflow    159    65  0 Jun12 ?        00:00:00 /usr/bin/python
>>> /usr/local/bin/airflow scheduler
>>> airflow    160    65  0 Jun12 ?        00:00:00 /usr/bin/python
>>> /usr/local/bin/airflow scheduler
>>> airflow    161    65  0 Jun12 ?        00:00:00 /usr/bin/python
>>> /usr/local/bin/airflow scheduler
>>> airflow    163    64  0 Jun12 ?        00:00:08 gunicorn: master
>>> [airflow.www.app:cached_app()]
>>> airflow    168   163  0 Jun12 ?        00:00:18 gunicorn: worker
>>> [airflow.www.app:cached_app()]
>>> airflow    170   163  0 Jun12 ?        00:00:13 gunicorn: worker
>>> [airflow.www.app:cached_app()]
>>> airflow    171   163  0 Jun12 ?        00:00:08 gunicorn: worker
>>> [airflow.www.app:cached_app()]
>>> airflow    174   163  0 Jun12 ?        00:00:08 gunicorn: worker
>>> [airflow.www.app:cached_app()]
>>> airflow   7627     0  0 Jun12 ?        00:00:00 bash
>>> airflow   8091  7627  0 Jun12 ?        00:00:00 python
>>> airflow  28808     0  0 16:44 ?        00:00:00 bash
>>> airflow  28823 28808  0 16:44 ?        00:00:00 ps -ef
>>> 
>>> On Mon, Jun 13, 2016 at 8:42 AM, Maxime Beauchemin <
>>> maximebeauchemin@gmail.com> wrote:
>>> 
>>>> Can you confirm that it's the scheduler process using that CPU?
>>>> 
>>>> The SCHEDULER_HEARTBEAT_SEC configuration defines a minimum duration for
>>>> scheduling cycles, where the scheduler evaluates all active DagRun and
>>>> attempts to kick off task instances whose dependencies are met. Once the
>>>> cycle is done, the scheduler should sleep until the next heartbeat, so
>> CPU
>>>> should look spiky.
>>>> 
>>>> Max
>>>> 
>>>> On Mon, Jun 13, 2016 at 8:26 AM, harish singh <harish.singh22@gmail.com
>>> 
>>>> wrote:
>>>> 
>>>>> Yup, I tried changing the scheduler heartbeat to 60 seconds..
>>>>> Apart from not getting any update for 60 seconds, What are the side
>>>> effects
>>>>> of changing the two heartbeats? Shouldn't impact performance?
>>>>> 
>>>>> Also, I understand this cpu usage if there are 100s of dags. But with
>>>> just
>>>>> one active dag, doesnt 70% seem high? Esp in my case where there are
>>>> only
>>>>> 10 tasks in the dag making only curls (BashOperators).
>>>>> 
>>>>> Also, a side now, in a different environment where we have 10 dags
>>>> active,
>>>>> the cpu usage stays in the same 70-80% range.
>>>>> 
>>>>> On Mon, Jun 13, 2016, 8:14 AM Maxime Beauchemin <
>>>>> maximebeauchemin@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> The scheduler constantly attempts to schedule tasks, interacting
>> with
>>>> the
>>>>>> database and reloading DAG definition. In most larg-ish
>> environments,
>>>>>> burning up to a CPU to run the scheduler doesn't seem outrageous
to
>>>> me.
>>>>>> 
>>>>>> If you want to reduce the CPU load related to the scheduler check
>> out
>>>>>> SCHEDULER_HEARTBEAT_SEC and MAX_THREADS in the scheduler section
of
>>>>>> `airflow.cfg`
>>>>>> 
>>>>>> Max
>>>>>> 
>>>>>> On Sun, Jun 12, 2016 at 1:24 PM, harish singh <
>>>> harish.singh22@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi guys,
>>>>>>> 
>>>>>>> We are running airflow (for about 3 months now) inside a docker
>>>>> container
>>>>>>> on aws.
>>>>>>> 
>>>>>>> I just did a docker stats to check whats going on. The cpu
>>>> consumption
>>>>> is
>>>>>>> huge.
>>>>>>> We have around 15 DAGS. Only one DAG is turned ON. the remaining
>> are
>>>>> OFF.
>>>>>>> The DAG runs with a HOURLY schedule.
>>>>>>> 
>>>>>>> Right now, airflow is consuming almost 1 complete core.
>>>>>>> It seems there is some unnecessary spinning?
>>>>>>> This doesnt look like the right behavior.
>>>>>>> Is there a bug already filed for this? Or am not sure if there
is
>>>>>> something
>>>>>>> incorrect in the way I am using the airflow configuration.
>>>>>>> 
>>>>>>> CONTAINER           CPU %               MEM USAGE / LIMIT   
 MEM
>> %
>>>>>>>      NET I/O               BLOCK I/O
>>>>>>> CCC                         68.17%              619.7 MB / 2.147
>> GB
>>>>>>> 28.85%              1.408 GB / 939.4 MB   7.856 MB / 0 B
>>>>>>> XXX                          64.36%              619.4 MB / 2.147
>> GB
>>>>>>> 28.84%              1.211 GB / 807.6 MB   7.856 MB / 0 B
>>>>>>> 
>>>>>>> 
>>>>>>> Ariflow version 1.7.0
>>>>>>> 
>>>>>>> Airflow config:
>>>>>>> 
>>>>>>> sql_alchemy_pool_size = 5
>>>>>>> sql_alchemy_pool_recycle = 3600
>>>>>>> parallelism = 8
>>>>>>> dag_concurrency = 8
>>>>>>> max_active_runs_per_dag = 8
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Harish
>> 

Mime
View raw message