hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HADOOP-4539 question
Date Thu, 13 Aug 2009 17:41:03 GMT
On Thu, Aug 13, 2009 at 10:37 AM, Konstantin Shvachko <shv@yahoo-inc.com>wrote:

> Steve,
>
> There are other groups claimed they work on HA solution.
> We had discussions about it not so long ago in this list.
> Is it possible that your colleagues present their design?
> As you point out the issue gets fairly complex fast,
> particularly because of the split-brain problem you describe.
>

IMHO the split-brain problem is why failover has to either be triggered
manually, or has to be done by an external system like Linux-HA where you
can get multiple media connecting the two masters. In the past I've done
this for firewalls and DB servers using a null modem serial connection plus
a crossover plus pings over the LAN - with 3 separate heartbeats it's very
tough to get a split brain. If you absolutely must avoid it, you can also
trigger a "STONITH" policy: http://linux-ha.org/STONITH


>
> There are several jiras dedicated to the problem already.
> You can post your design there or create a new one.
>
> > Looking at the facebook/google "multi-master" solution, I think they
> > don't worry about consistency, just let the masters drift apart.
>
> Not sure I follow this.
> What facebook/google "multi-master" solution?
> Why would they not worry about consistency?
> Consistency of what?
>
> Thanks,
> --Konstantin
>
>
> Steve Loughran wrote:
>
>> Konstantin Shvachko wrote:
>>
>>> And the only remaining step is to implement fail-over mechanism.
>>>
>>
>> :)
>>
>> Colleagues of mine work on HA stuff; I try and steer clear of it as it
>> gets complex fast.  Test case: what happens when a network failure splits
>> the datacentre in two, you now have two clusters each with half the data and
>> possibly a primary/2ary master in each one. Then leave the partition up for
>> a while, do inconsistent operations on each then have the network come back
>> up.  Then work out how to merge the state
>>
>> Looking at the facebook/google "multi-master" solution, I think they don't
>> worry about consistency, just let the masters drift apart.
>>
>> see also Johan's recent talk on HDFS:
>> http://www.slideshare.net/steve_l/hdfs
>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message