airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teresa Fontanella De Santis <tfontanella...@gmail.com>
Subject Re: Scheduler error - cannot allocate memory
Date Tue, 27 Dec 2016 11:36:03 GMT
Bolke,

Thanks for the answer!

You are right about the process that takes up all remaining. But sometimes
it gets out of memory and sometimes not.

We are running Airflow (both scheduler, the webserver and execute the dags)
on an EC2 instance m4.xlarge (we have 4 workers), using LocalExecutor. In
this case, should be better to use CeleryExecutor?

Thanks again!

2016-12-26 18:41 GMT-03:00 Bolke de Bruin <bdbruin@gmail.com>:

> We dont handle this kind of errors in airflow, so it becomes a hard error
> and airflow bails out.
>
> You are running out of memory most likely as some other process is taking
> up all remaining. Are you running workers on the same machine? These will
> go up and down with mem usage over time depending the jobs you launch.
>
> This is not related to "restarting the scheduler" (which is kind of
> outdated anyway).
>
> Bolke
>
> Sent from my iPhone
>
> > On 26 Dec 2016, at 21:47, Teresa Fontanella De Santis <
> tfontanella011@gmail.com> wrote:
> >
> > Hi everyone!
> >
> > We were running the scheduler without problems for a while. We are using
> > ec2 instance (mx4.large). We were running with airflow scheduler (no
> > supervisor.d, no monit, etc).
> > Suddenly, the scheduler stopped, showing this message:
> >
> > [2016-12-22 21:01:15,038] {jobs.py:574} INFO - Prioritizing 1 queued
> > jobs
> > [742/1767]
> > [2016-12-22 21:01:15,041] {jobs.py:603} INFO - Pool None has 128 slots, 1
> > task instances in
> > queue
> >
> > [2016-12-22 21:01:15,041] {models.py:154} INFO - Filling up the DagBag
> from
> > /home/ec2-user/analytics/airflow/dags
> >
> > [2016-12-22 21:01:15,155] {jobs.py:726} INFO - Starting 2 scheduler
> > jobs
> >
> > [2016-12-22 21:01:15,157] {jobs.py:761} ERROR - [Errno 12] Cannot
> allocate
> > memory
> >
> > Traceback (most recent call
> > last):
> >
> >  File "/usr/local/lib/python3.5/site-packages/airflow/jobs.py", line
> 728,
> > in
> > _execute
> >
> >
> > j.start()
> >
> >  File "/usr/lib64/python3.5/multiprocessing/process.py", line 105, in
> > start
> >
> >    self._popen =
> > self._Popen(self)
> >
> >  File "/usr/lib64/python3.5/multiprocessing/context.py", line 212, in
> > _Popen
> >
> >    return
> > _default_context.get_context().Process._Popen(process_obj)
> >
> >  File "/usr/lib64/python3.5/multiprocessing/context.py", line 267, in
> > _Popen
> >
> >    return
> > Popen(process_obj)
> >
> >  File "/usr/lib64/python3.5/multiprocessing/popen_fork.py", line 20, in
> > __init__
> >
> >
> > self._launch(process_obj)
> >
> >  File "/usr/lib64/python3.5/multiprocessing/popen_fork.py", line 67, in
> > _launch
> >
> >    self.pid =
> > os.fork()
> >
> > OSError: [Errno 12] Cannot allocate
> > memory
> >
> > Traceback (most recent call last):
> >  File "/usr/local/bin/airflow", line 15, in <module>
> >
> >
> > The dags which failed didn't show any log (there weren't stored on
> airflow
> > instance and there is no remote logs). So we don't have any idea of what
> > would happened (only that there was not enough memory to fork)
> > It is well known that is recommended to restart the scheduler
> periodically
> > (according to this
> > <https://medium.com/handy-tech/airflow-tips-tricks-and-
> pitfalls-9ba53fba14eb#.80c6g1n1s>),
> > but... do you have any idea why this can happen? Is there something we
> can
> > do (or some bug we can fix)?
> >
> >
> > Thanks in advance!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message