Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 487D39F1E for ; Thu, 9 Aug 2012 05:39:14 +0000 (UTC) Received: (qmail 41126 invoked by uid 500); 9 Aug 2012 05:39:12 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 41073 invoked by uid 500); 9 Aug 2012 05:39:12 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 41059 invoked by uid 99); 9 Aug 2012 05:39:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Aug 2012 05:39:11 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of linlma@gmail.com designates 209.85.220.169 as permitted sender) Received: from [209.85.220.169] (HELO mail-vc0-f169.google.com) (209.85.220.169) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Aug 2012 05:39:05 +0000 Received: by vcbfl10 with SMTP id fl10so88156vcb.14 for ; Wed, 08 Aug 2012 22:38:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Ca/HKbyCu8qzLzjj3qtbsx2nhKJ1HxHMCQSupWC+qcI=; b=ZIEQkqQrOMcMHB1iTojFq8k82BCVohl8jaUQwSsmUe+3Lp97J4V/CkAC8f49GQWwi7 0XC4TfoJbVkHjlQokInKo5PNqEf9HlDsxAlqDhx6E+Mhtj7kV+8WZTmwlEcuNJich4ko N96ItYUKHS/XUvRaBUXi7JVcpP3RNKY+bOPx5KGDzJVo5AeGHu8seCa7mRXulx89qnEB yf7kncRYha/khf039rmD2znW3ZP78V3pAhLXgfshxZX884gG1mAPUErwYGP5GGs+lYIS 6RTdh4gWYfp/61U99T+ixo7UwJ8x7FEqfeZB0f+zxSoE1T/n8QFGR9CYsZ9SZmFJreR1 312w== MIME-Version: 1.0 Received: by 10.220.21.80 with SMTP id i16mr15887929vcb.70.1344490724414; Wed, 08 Aug 2012 22:38:44 -0700 (PDT) Received: by 10.58.203.66 with HTTP; Wed, 8 Aug 2012 22:38:44 -0700 (PDT) In-Reply-To: References: <7523F466-D0AC-4124-8591-2079FF9007D8@gmail.com> <1344475870.47989.YahooMailNeo@web121704.mail.ne1.yahoo.com> Date: Thu, 9 Aug 2012 13:38:44 +0800 Message-ID: Subject: Re: consistency, availability and partition pattern of HBase From: Lin Ma To: user@hbase.apache.org Cc: amansk@gmail.com Content-Type: multipart/alternative; boundary=bcaec54a32ca0853a904c6cea402 --bcaec54a32ca0853a904c6cea402 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Amandeep, thanks for your comments, and I will definitely read the paper you suggested. For Hadoop itself, what do you think its CAP features? Which one of the CAP is sacrificed? regards, Lin On Thu, Aug 9, 2012 at 1:34 PM, Amandeep Khurana wrote: > Firstly, I recommend you read the GFS and Bigtable papers. That'll give y= ou > a good understanding of the architecture. Adhoc question on the mailing > list won't. > > I'll try to answer some of your questions briefly. Think of HBase as a > database layer over an underlying filesystem (the same way MySQL is over > ext2/3/4 etc). The filesystem for HBase in this case is HDFS. HDFS > replicates data for redundancy and fault tolerance. HBase has region > servers that serve the regions. Regions form tables. Region servers persi= st > their data on HDFS. Now, every region is served by one and only one regio= n > server. So, HBase is not replicating anything. Replication is handled at > the storage layer. If a region server goes down, all its regions now need > to be served by some other region server. During this period of region > assignment, the clients experience degraded availability if they try to > interact with any of those regions. > > Coming back to CAP. HBase chooses to degrade availability in the face of > partitions. "Partition" is a very general term here and does not > necessarily mean network partitions. Any node falling off the HBase clust= er > can be considered to be a partition. So, when failures happen, HBase > degrades availability but does not give up consistency. Consistency in th= is > context is sort of the equivalent of atomicity in ACID. In the context of > HBase, any copy of data that is written to HBase will be visible to all > clients. There is no concept of multiple different versions that the > clients need to reconcile between. When you read, you always get the same > version of the row you are reading. In other words, HBase is strongly > consistent. > > Hope that clears things up a bit. > > On Thu, Aug 9, 2012 at 8:02 AM, Lin Ma wrote: > > > Thank you Lars. > > > > Is the same data store duplicated copy across region server? If so, if > one > > primary server for the region dies, client just need to read from the > > secondary server for the same region. Why there is data is unavailable > > time? > > > > BTW: please feel free to correct me for any wrong knowledge about HBase= . > > > > regards, > > Lin > > > > On Thu, Aug 9, 2012 at 9:31 AM, lars hofhansl > wrote: > > > > > After a write completes the next read (regardless of the location it = is > > > issued from) will see the latest value. > > > This is because at any given time exactly RegionServer is responsible > for > > > a specific Key > > > (through assignment of key ranges to regions and regions to > > RegionServers). > > > > > > > > > As Mohit said, the trade off is that data is unavailable if a > > RegionServer > > > dies until another RegionServer picks up the regions (and by extensio= n > > the > > > key range) > > > > > > -- Lars > > > > > > > > > ----- Original Message ----- > > > From: Lin Ma > > > To: user@hbase.apache.org > > > Cc: > > > Sent: Wednesday, August 8, 2012 8:47 AM > > > Subject: Re: consistency, availability and partition pattern of HBase > > > > > > And consistency is not sacrificed? i.e. all distributed clients' upda= te > > > will results in sequential / real time update? Once update is done by > one > > > client, all other client could see results immediately? > > > > > > regards, > > > Lin > > > > > > On Wed, Aug 8, 2012 at 11:17 PM, Mohit Anchlia > > >wrote: > > > > > > > I think availability is sacrificed in the sense that if region serv= er > > > > fails clients will have data inaccessible for the time region comes > up > > on > > > > some other server, not to confuse with data loss. > > > > > > > > Sent from my iPad > > > > > > > > On Aug 7, 2012, at 11:56 PM, Lin Ma wrote: > > > > > > > > > Thank you Wei! > > > > > > > > > > Two more comments, > > > > > > > > > > 1. How about Hadoop's CAP characters do you think about? > > > > > 2. For your comments, if HBase implements "per key sequential > > > > consistency", > > > > > what are the missing characters for consistency? Cross-key update > > > > > sequences? Could you show me an example about what you think are > > > missed? > > > > > thanks. > > > > > > > > > > regards, > > > > > Lin > > > > > > > > > > On Wed, Aug 8, 2012 at 12:18 PM, Wei Tan wrote: > > > > > > > > > >> Hi Lin, > > > > >> > > > > >> In the CAP theorem > > > > >> Consistency stands for atomic consistency, i.e., each CRUD > operation > > > > >> occurs sequentially in a global, real-time clock > > > > >> Availability means each server if not partitioned can accept > > requests > > > > >> > > > > >> Partition means network partition > > > > >> > > > > >> As far as I understand (although I do not see any official > > > > documentation), > > > > >> HBase achieved "per key sequential consistency", i.e., for a > > specific > > > > key, > > > > >> there is an agreed sequence, for all operations on it. This is > > weaker > > > > than > > > > >> strong or sequential consistency, but stronger than "eventual > > > > >> consistency". > > > > >> > > > > >> BTW: CAP was proposed by Prof. Eric Brewer... > > > > >> http://en.wikipedia.org/wiki/Eric_Brewer_%28scientist%29 > > > > >> > > > > >> Best Regards, > > > > >> Wei > > > > >> > > > > >> Wei Tan > > > > >> Research Staff Member > > > > >> IBM T. J. Watson Research Center > > > > >> 19 Skyline Dr, Hawthorne, NY 10532 > > > > >> wtan@us.ibm.com; 914-784-6752 > > > > >> > > > > >> > > > > >> > > > > >> From: Lin Ma > > > > >> To: user@hbase.apache.org, > > > > >> Date: 08/07/2012 09:30 PM > > > > >> Subject: consistency, availability and partition pattern = of > > > HBase > > > > >> > > > > >> > > > > >> > > > > >> Hello guys, > > > > >> > > > > >> According to the notes by Werner*, "*He presented the CAP theore= m, > > > which > > > > >> states that of three properties of shared-data systems=97data > > > consistency, > > > > >> system availability, and tolerance to network partition=97only t= wo > can > > > be > > > > >> achieved at any given time." =3D> > > > > >> > > > http://www.allthingsdistributed.com/2008/12/eventually_consistent.htm= l > > > > >> > > > > >> But it seems HBase could achieve all of the 3 features at the sa= me > > > time. > > > > >> Does it mean HBase breaks the rule by Werner. :-) > > > > >> > > > > >> If not, which one is sacrificed -- consistency (by using HDFS), > > > > >> availability (by using Zookeeper) or partition (by using region = / > > > column > > > > >> family) ? And why? > > > > >> > > > > >> regards, > > > > >> Lin > > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > --bcaec54a32ca0853a904c6cea402--