airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matus Valo <matusv...@gmail.com>
Subject Re: scheduler running on multiple nodes
Date Sun, 05 Mar 2017 20:40:30 GMT
Hi all,

I have done some investigation regarding high availability of the 
scheduler since it is crucial for our deployment. I would like to share 
the results of my investigation.

I have found out that there is a solution for it - see [1], [2]. After 
closer look, I have found out that it is using SSH for checking whether 
the scheduler is running on the other node. For our use case, this is 
not optimal solution since we don't want to have SSH traffic between the 
nodes. After that, I have found out that the HA cluster can be used to 
get a failover solution for the scheduler. It seems that consul [3] is 
very easy to use solution. I was able to create such HA cluster (using 
consul lock) very quickly. I have done some tests with such cluster 
consisting of 3 nodes and it turns out that it works great.

I was missing any information about such topic in the airflow 
documentation. For someone (like me) who does have no experience with HA 
clusters it can be difficult to find out how such HA cluster can be 
deployed. Maybe in future, I would like to create some documentation 
about it. Do you think that it would be helpful contribution to the project?

[1] https://github.com/teamclairvoyant/airflow-scheduler-failover-controller
[2] 
https://www.slideshare.net/RobertSanders49/airflow-clustering-and-high-availability
[3] https://www.consul.io/

Thanks,


Matus

I have done some research in this topic and I would like to share some 
results with you.
On 02/09/2017 03:47 PM, matus valo wrote:
>
> Hi all,
>
> I am considering deployment of airflow as pipeline framework. I have 
> found out multiple articles explaining deployment of airflow in 
> distributed environment (e.g. [1]). Unfortunately, I was not able to 
> find out any use case where scheduler is deployed distributed on 
> multiple nodes. Is it possible to have scheduler distributed on 
> multiple nodes to prevent single point of failure? I haven’t found any 
> mention about it in documentation. I have found out in [2] that it is 
> not possible but on the other hand in [3] is reference that this can 
> be solved in new version of airflow.
>
> Thanks,
>
>
> Matus
>
> [1] http://site.clairvoyantsoft.com/setting-apache-airflow-cluster/
>
> [2] https://groups.google.com/forum/#!topic/airbnb_airflow/-1wKa3OcwME 
> <https://groups.google.com/forum/#%21topic/airbnb_airflow/-1wKa3OcwME>
>
> [3] https://issues.apache.org/jira/browse/AIRFLOW-678
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message