hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: consistency, availability and partition pattern of HBase
Date Thu, 09 Aug 2012 06:04:12 GMT
Correct. You are limited to the throughput of a single region server while
interacting with a particular region. This throughput limitation is
typically handled by designing your keys such that your data is distributed
well across the cluster.
Having multiple region servers serve a single region gets you into the land
of maintaining consistency across copies, which is challenging. It might be
doable but that's not the design choice Bigtable (and hence HBase) made
initially.

On Thu, Aug 9, 2012 at 11:04 AM, Lin Ma <linlma@gmail.com> wrote:

> Thanks
>
> "only a single RegionServer ever hosts a region at once" -- I know HDFS
> have multiple copies for the same file. Is region server works in
> active-passive way, i.e. even if there are multiple copies, only one region
> server could serve? If so, will it be bottleneck, supposing the traffic to
> that region is too high?
>
> regards,
> Lin
>
> On Thu, Aug 9, 2012 at 11:09 AM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > Actual data backing hbase is replicated, but that is handled by HDFS.
>  Yes,
> > if you lose an hdfs datanode, clients (in this case the client is hbase)
> > move to the next node in the pipeline.
> >
> > However, only a single RegionServer ever hosts a region at once.  If the
> > RegionServer dies, there is a period where the master must notice the
> > regions are unhosted and move them to other regionservers.  During that
> > period, data is inaccessible or modifiable.
> >
> > On Wed, Aug 8, 2012 at 10:32 PM, Lin Ma <linlma@gmail.com> wrote:
> >
> > > Thank you Lars.
> > >
> > > Is the same data store duplicated copy across region server? If so, if
> > one
> > > primary server for the region dies, client just need to read from the
> > > secondary server for the same region. Why there is data is unavailable
> > > time?
> > >
> > > BTW: please feel free to correct me for any wrong knowledge about
> HBase.
> > >
> > > regards,
> > > Lin
> > >
> > > On Thu, Aug 9, 2012 at 9:31 AM, lars hofhansl <lhofhansl@yahoo.com>
> > wrote:
> > >
> > > > After a write completes the next read (regardless of the location it
> is
> > > > issued from) will see the latest value.
> > > > This is because at any given time exactly RegionServer is responsible
> > for
> > > > a specific Key
> > > > (through assignment of key ranges to regions and regions to
> > > RegionServers).
> > > >
> > > >
> > > > As Mohit said, the trade off is that data is unavailable if a
> > > RegionServer
> > > > dies until another RegionServer picks up the regions (and by
> extension
> > > the
> > > > key range)
> > > >
> > > > -- Lars
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Lin Ma <linlma@gmail.com>
> > > > To: user@hbase.apache.org
> > > > Cc:
> > > > Sent: Wednesday, August 8, 2012 8:47 AM
> > > > Subject: Re: consistency, availability and partition pattern of HBase
> > > >
> > > > And consistency is not sacrificed? i.e. all distributed clients'
> update
> > > > will results in sequential / real time update? Once update is done by
> > one
> > > > client, all other client could see results immediately?
> > > >
> > > > regards,
> > > > Lin
> > > >
> > > > On Wed, Aug 8, 2012 at 11:17 PM, Mohit Anchlia <
> mohitanchlia@gmail.com
> > > > >wrote:
> > > >
> > > > > I think availability is sacrificed in the sense that if region
> server
> > > > > fails clients will have data inaccessible for the time region comes
> > up
> > > on
> > > > > some other server, not to confuse with data loss.
> > > > >
> > > > > Sent from my iPad
> > > > >
> > > > > On Aug 7, 2012, at 11:56 PM, Lin Ma <linlma@gmail.com> wrote:
> > > > >
> > > > > > Thank you Wei!
> > > > > >
> > > > > > Two more comments,
> > > > > >
> > > > > > 1. How about Hadoop's CAP characters do you think about?
> > > > > > 2. For your comments, if HBase implements "per key sequential
> > > > > consistency",
> > > > > > what are the missing characters for consistency? Cross-key update
> > > > > > sequences? Could you show me an example about what you think
are
> > > > missed?
> > > > > > thanks.
> > > > > >
> > > > > > regards,
> > > > > > Lin
> > > > > >
> > > > > > On Wed, Aug 8, 2012 at 12:18 PM, Wei Tan <wtan@us.ibm.com>
> wrote:
> > > > > >
> > > > > >> Hi Lin,
> > > > > >>
> > > > > >> In the CAP theorem
> > > > > >> Consistency stands for atomic consistency, i.e., each CRUD
> > operation
> > > > > >> occurs sequentially in a global, real-time clock
> > > > > >> Availability means each server if not partitioned can accept
> > > requests
> > > > > >>
> > > > > >> Partition means network partition
> > > > > >>
> > > > > >> As far as I understand (although I do not see any official
> > > > > documentation),
> > > > > >> HBase achieved "per key sequential consistency", i.e., for
a
> > > specific
> > > > > key,
> > > > > >> there is an agreed sequence, for all operations on it. This
is
> > > weaker
> > > > > than
> > > > > >> strong or sequential consistency, but stronger than "eventual
> > > > > >> consistency".
> > > > > >>
> > > > > >> BTW: CAP was proposed by Prof. Eric Brewer...
> > > > > >> http://en.wikipedia.org/wiki/Eric_Brewer_%28scientist%29
> > > > > >>
> > > > > >> Best Regards,
> > > > > >> Wei
> > > > > >>
> > > > > >> Wei Tan
> > > > > >> Research Staff Member
> > > > > >> IBM T. J. Watson Research Center
> > > > > >> 19 Skyline Dr, Hawthorne, NY  10532
> > > > > >> wtan@us.ibm.com; 914-784-6752
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> From:   Lin Ma <linlma@gmail.com>
> > > > > >> To:    user@hbase.apache.org,
> > > > > >> Date:   08/07/2012 09:30 PM
> > > > > >> Subject:        consistency, availability and partition
pattern
> of
> > > > HBase
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> Hello guys,
> > > > > >>
> > > > > >> According to the notes by Werner*, "*He presented the CAP
> theorem,
> > > > which
> > > > > >> states that of three properties of shared-data systems—data
> > > > consistency,
> > > > > >> system availability, and tolerance to network partition—only
two
> > can
> > > > be
> > > > > >> achieved at any given time." =>
> > > > > >>
> > > >
> http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
> > > > > >>
> > > > > >> But it seems HBase could achieve all of the 3 features at
the
> same
> > > > time.
> > > > > >> Does it mean HBase breaks the rule by Werner. :-)
> > > > > >>
> > > > > >> If not, which one is sacrificed -- consistency (by using
HDFS),
> > > > > >> availability (by using Zookeeper) or partition (by using
region
> /
> > > > column
> > > > > >> family) ? And why?
> > > > > >>
> > > > > >> regards,
> > > > > >> Lin
> > > > > >>
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message