hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Austin Heyne <ahe...@ccri.com>
Subject Re: HA master on EMR
Date Fri, 31 Aug 2018 19:51:43 GMT
I have played around with ReadReplicas a fair bit and that might be a 
good enough stopgap should something go wrong. Ideally we wouldn't loose 
the primary cluster but that may not be reasonable with our given 


On 08/31/2018 03:50 PM, Zach York wrote:
> Hey Austin,
> It sounds like you are asking about read availability in the case where a
> primary cluster becomes unhealthy?
> In that case, you should look at the HBase on S3 Read Replica clusters
> feature[1][2]. This allows for High availability reads if the primary
> cluster becomes unhealthy.
> Let me know if I misinterpreted your ask!
> Thanks,
> Zach
> [1]
> https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-s3.html#emr-hbase-s3-read-replica
> [2]
> https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/
>> ---------- Forwarded message ---------
>>> From: Austin Heyne <aheyne@ccri.com>
>>> Date: Thu, Aug 30, 2018 at 8:30 AM
>>> Subject: HA master on EMR
>>> To: <user@hbase.apache.org>
>>> HBase on EMR is fairly reliable but is still subject to hardware
>>> failures (which has happened to me before). Is there a best practice for
>>> adding backup masters to an EMR cluster?
>>> I know this isn't technically a supported feature from AWS but we're
>>> already heavily invested into HBase on EMR and would like to investigate
>>> options on mitigating the risk of a master failure. In EMR if the master
>>> dies the entire cluster is terminated so we need fail over for HBase,
>>> Hadoop/HDFS and Zookeeper. The one idea that I've had is to create a
>>> second (or third) EMR cluster with its HBase, Zookeeper and Hadoop/HDFS
>>> configuration pointed to the primary cluster. This would in effect add
>>> the RegionServers and Datanodes to the primary cluster. I know that
>>> loosing 1/3 to 1/2 of your Datanodes would most likely mean you would
>>> loose some WALs but re-ingesting the last days worth of data is
>>> acceptable trade off for us in exchange for not having downtime.
>>> I realize this is a slightly crazy idea and using something like
>>> Kubernetes is the 'correct' solution but I have to work with what we
>>> have and mitigate possible issues. My question is are there any big
>>> issues that anyone would foresee us having with this idea?
>>> Thanks for the feedback,
>>> Austin

Austin L. Heyne

View raw message