hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase and Consistency in CAP
Date Sat, 03 Dec 2011 08:32:07 GMT
> From: Mohit Anchlia <mohitanchlia@gmail.com>
> I am having just bit of conflict in understanding how is
> random node failure different than network partition? In both cases
> there is an impact clearly visible to the user (time it takes to
> failover and replay logs)?


I think you are conflating things a bit.
Partition tolerance in CAP, shorthanded, is the ability of a system to survive message loss
(due to server failure, network problem, etc.). BigTable/HBase does this of course. A server
failure or message loss does not toast the database.

Availability is the dimension of "CAP" that you are pondering here. ("time it takes to failover
and replay logs") Recovery from message loss / server failure includes the time it takes to
fail over and replay logs. 

BigTable does trade some availability to achieve a stronger level of consistency than would
be possible otherwise. The Google paper includes some discussion of this design rationale.


Best regards,

    - Andy


>________________________________
> From: Mohit Anchlia <mohitanchlia@gmail.com>
>To: user@hbase.apache.org 
>Sent: Saturday, December 3, 2011 5:48 AM
>Subject: Re: HBase and Consistency in CAP
> 
>Thanks. I am having just bit of conflict in understanding how is
>random node failure different than network partition? In both cases
>there is an impact clearly visible to the user (time it takes to
>failover and replay logs)?
>
>On Fri, Dec 2, 2011 at 1:42 PM, Ian Varley <ivarley@salesforce.com> wrote:
>> The simple answer is that HBase isn't architected such that 2 region servers can
simultaneously host the same region. In addition to being much simpler from an architecture
point of view, that also allows for user-facing features that would be difficult or impossible
to achieve otherwise: single-row put atomicity, atomic check-and-set operations, atomic increment
operations, etc.--things that are only possible if you know for sure that exactly one machine
is in control of the row.
>>
>> Ian
>>
>> On Dec 2, 2011, at 2:54 PM, Mohit Anchlia wrote:
>>
>> Thanks for the overview. It's helpful. Can you also help me understand
>> why 2 region servers for the same row keys can't be running on the
>> nodes where blocks are being replicated? I am assuming all the
>> logs/HFiles etc are already being replicated so if one region server
>> fails other region server is still taking reads/writes.
>>
>> On Fri, Dec 2, 2011 at 12:15 PM, Ian Varley <ivarley@salesforce.com<mailto:ivarley@salesforce.com>>
wrote:
>> Mohit,
>>
>> Yeah, those are great places to go and learn.
>>
>> To fill in a bit more on this topic: "partition-tolerance" usually refers to the
idea that you could have a complete disconnection between N sets of machines in your data
center, but still be taking writes and serving reads from all the servers. Some "NoSQL" databases
can do this (to a degree), but HBase cannot; the master and ZK quorum must be accessible from
any machine that's up and running the cluster.
>>
>> Individual machines can go down, as J-D said, and the master will reassign those
regions to another region server. So, imagine you had a network switch fail that disconnected
10 machines in a 20-machine cluster; you wouldn't have 2 baby 10-machine clusters, like you
might with some other software; you'd just have 10 machines "down" (and probably a significant
interruption while the master replays logs on the remaining 10). That would also require that
the underlying HDFS cluster (assuming it's on the same machines) was keeping replicas of the
blocks on different racks (which it does by default), otherwise there's no hope.
>>
>> HBase makes this trade-off intentionally, because in real-world scenarios, there
aren't too many cases where a true network partition would be survived by the rest of your
stack, either (e.g. imagine a case where application servers can't access a relational database
server because of a partition; you're just down). The focus of HBase fault tolerance is recovering
from isolated machine failures, not the collapse of your infrastructure.
>>
>> Ian
>>
>>
>> On Dec 2, 2011, at 2:03 PM, Jean-Daniel Cryans wrote:
>>
>> Get the HBase book:
>> http://www.amazon.com/HBase-Definitive-Guide-Lars-George/dp/1449396100
>>
>> And/Or read the Bigtable paper.
>>
>> J-D
>>
>> On Fri, Dec 2, 2011 at 12:01 PM, Mohit Anchlia <mohitanchlia@gmail.com<mailto:mohitanchlia@gmail.com>>
wrote:
>> Where can I read more on this specific subject?
>>
>> Based on your answer I have more questions, but I want to read more
>> specific information about how it works and why it's designed that
>> way.
>>
>> On Fri, Dec 2, 2011 at 11:59 AM, Jean-Daniel Cryans <jdcryans@apache.org<mailto:jdcryans@apache.org>>
wrote:
>> No, data is only served by one region server (even if it resides on
>> multiple data nodes). If it dies, clients need to wait for the log
>> replay and region reassignment.
>>
>> J-D
>>
>> On Fri, Dec 2, 2011 at 11:57 AM, Mohit Anchlia <mohitanchlia@gmail.com<mailto:mohitanchlia@gmail.com>>
wrote:
>> Why is HBase consisdered high in consistency and that it gives up
>> parition tolerance? My understanding is that failure of one data node
>> still doesn't impact client as they would re-adjust the list of
>> available data nodes.
>>
>>
>
>
> 

Mime
View raw message