airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Airflow scheduler stops scheduling with no notice
Date Thu, 14 Jul 2016 17:44:46 GMT
Hi Tamara,

Please supply version information. 

With regards to your issue a “connection timed out” normally means that the database server
became unreachable or too busy. So I would look at the db first. Are your connections being
exhausted (older versions of Airflow were not to great a closing connections and we still
have work to do in that area). If you run something older than 1.7.1.3 you could consider
using “—num-runs X”. Where the scheduler will quit after X runs. Then supervisord needs
to restart the scheduler obviously.

Moreover, why are you using the SequentialExecutor? LocalExecutor will allow you to parallelize
your tasks and scale vertically, CeleryExecutor will do the same but also allow you to scale
horizontally at the cost of a slightly more complex setup.

Regards,
Bolke

> Op 14 jul. 2016, om 11:44 heeft Tamara Mendt <tm@hellofresh.com> het volgende geschreven:
> 
> Hello,
> 
> Sorry for writing this in the dev list, but as there is no user list yet I
> decided this is the best place. We are currently running Airflow with a
> SequentialExecutor and a Postgres DB in the backend. We run the airflow
> scheduler and webserver using supervisor so that they should be
> automatically restarted if either fails.
> 
> Normally this setting works fine. However, we have noticed that sometimes
> the scheduler stops scheduling jobs and only starts rescheduling them if we
> manually restart it from supervisor. I could see this message in the
> airflow scheduler error logs, so the reason the scheduler stops scheduling
> seems to be related to the connection to the DB:
> 
> <class 'sqlalchemy.exc.DatabaseError'> (psycopg2.DatabaseError) SSL SYSCALL
> error: Connection timed out
> [SQL: 'UPDATE job SET latest_heartbeat=%(latest_heartbeat)s WHERE job.id =
> %(job_id)s'] [parameters: {'latest_heartbeat': datetime.datetime(2016, 7,
> 11, 10, 26, 7, 44521), 'job_id': 10246}]
> 
> Also, when I look for the job id in the Airflow DB I can see the following:
> 
>  id   | dag_id |  state  |   job_type   |         start_date         |
> end_date |      latest_heartbeat      |   executor_class   |
> -------+--------+---------+--------------+----------------------------+----------+----------------------------+--------------------+----------+
> 10246 |        | running | SchedulerJob | 2016-07-08 15:38:06.911346
> |          | 2016-07-14 05:30:56.407149 | SequentialExecutor |
> 
> The latest heartbeat corresponds to the moment when the scheduler stopped
> scheduling jobs. Our supervisor configuration for the scheduler is the
> following:
> 
> [program:airflow-scheduler]
> command= airflow scheduler
> autostart=true
> autorestart=true
> startretries=3
> stderr_logfile=/var/logs/airflow-logs/airflow-scheduler.err.log
> stdout_logfile=/var/logs/airflow-logs/airflow-scheduler.out.log
> 
> I have added these two lines now to the supervisor configuration in case
> the problem was that supervisor was not tracing that the scheduler had quit:
> 
> stopsignal=QUIT
> stopasgroup=true
> 
> If anyone has had a similar problem, or any other ideas as to how we could
> avoid the need to manually restart the scheduler and also what could be
> causing the scheduler to stop in the first place, they would be much
> appreciated.
> 
> Cheers,
> 
> -- 
> [image: logo]
> *Tamara Mendt* *Data Engineer**, HelloFresh Global*
> Tel: +49 (0)175 226 18 12 <+4903000000000> | Saarbrücker Str. 37a | 10405
> Berlin
> tm@HelloFresh.com
>  <http://www.facebook.com/hellofreshde>  <http://twitter.com/HelloFreshde>
> <http://instagram.com/hellofreshde/>  <http://blog.hellofresh.de/>
> <https://app.adjust.com/ayje08_2qh16w?campaign=Signature&adgroup=US&deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_source%3Demail%26utm_medium%3Dsignature%26utm_campaign%3Dapp&fallback=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_source%3Demail%26utm_medium%3Dsignature%26utm_campaign%3Dapp>
> *HelloFresh App –Download Now!*
> <https://app.adjust.com/ayje08_2qh16w?campaign=Signature&adgroup=US&deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_source%3Demail%26utm_medium%3Dsignature%26utm_campaign%3Dapp&fallback=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_source%3Demail%26utm_medium%3Dsignature%26utm_campaign%3Dapp>
> *We're active in:* US
> <https://www.hellofresh.com/?utm_medium=email&utm_source=email_signature> |
> DE <https://www.hellofresh.de/?utm_medium=email&utm_source=email_signature>
> | UK
> <https://www.hellofresh.co.uk/?utm_medium=email&utm_source=email_signature>
> | NL
> <https://www.hellofresh.nl/?utm_medium=email&utm_source=email_signature> |
> AU
> <https://www.hellofresh.au.com/?utm_medium=email&utm_source=email_signature>
> | BE
> <https://www.hellofresh.be/?utm_medium=email&utm_source=email_signature> |
> AT <https://www.hellofresh.at/?utm_medium=email&utm_source=email_signature>
> www.HelloFreshGroup.com <http://www.hellofreshgroup.com/>
> 
> We are hiring around the world – Click here to join us
> <https://www.hellofresh.de/jobs>
> HelloFresh AG, Berlin (Sitz der Gesellschaft) | Vorstände: Dominik S.
> Richter (Vorsitzender), Thomas W. Griesel, Christian Gärtner | Vorsitzender
> des Aufsichtsrats: Jeffrey Lieberman | Eingetragen beim Amtsgericht
> Charlottenburg, HRB 171666 B | USt-Id Nr.: DE 302210417
> 
> *CONFIDENTIALITY NOTICE:*This message (including any attachments) is
> confidential and may be privileged. It may be read, copied and used only by
> the intended recipient. If you have received it in error please contact the
> sender (by return e-mail) immediately and delete this message. Any
> unauthorized use or dissemination of this message in whole or in parts is
> strictly prohibited.


Mime
View raw message