hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Ma <lin...@gmail.com>
Subject Re: how client location a region/tablet?
Date Fri, 24 Aug 2012 01:19:41 GMT
Me too, Abhishek -- you are not alone. But it is good to learn and discuss
here to know various design choices.

regards,
Lin

On Fri, Aug 24, 2012 at 1:06 AM, Pamecha, Abhishek <apamecha@x.com> wrote:

> I too thought there are multiple meta regions where as just one ROOT.  May
> be I am mixing b/w Big Table and Hbase.
>
> Thanks,
> Abhishek
>
>
> -----Original Message-----
> From: Lin Ma [mailto:linlma@gmail.com]
> Sent: Thursday, August 23, 2012 9:41 AM
> To: user@hbase.apache.org; harsh@cloudera.com
> Cc: doug.meil@explorysmedical.com
> Subject: Re: how client location a region/tablet?
>
> Thanks, Harsh!
>
> - "HBase currently keeps a single META region (Doesn't split it). " --
> does it mean there is only one row in ROOT table, which points the only one
> META region?
> - In Big Table, it seems they have multiple META regions (tablets), is it
> an advantage over HBase? :-)
>
> regards,
> Lin
> On Thu, Aug 23, 2012 at 11:48 PM, Harsh J <harsh@cloudera.com> wrote:
>
> > HBase currently keeps a single META region (Doesn't split it). ROOT
> > holds META region location, and META has a few rows in it, a few of
> > them for each table. See also the class MetaScanner.
> >
> > On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma <linlma@gmail.com> wrote:
> > > Dong,
> > >
> > > Some more thoughts, after reading data structure for HRegionInfo =>
> > > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.
> > > html
> > ,
> > > start key and end key looks informative which we could leverage,
> > >
> > > - I am not sure if we could leverage this information (stored as
> > > part of value in table ROOT) to find which META region may contains
> > > region server information for row-key 123 of data table ABC;
> > > - But I think unfortunately the information is stored in value of
> > > table ROOT, other than key field of table ROOT, so that we have to
> > > iterate each row in ROOT table one by one to figure out which META
> > > region server to access.
> > >
> > > Not sure if I get the points. Please feel free to correct me.
> > >
> > > regards,
> > > Lin
> > >
> > > On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma <linlma@gmail.com> wrote:
> > >
> > >> Doug, very informative document. Thanks a lot!
> > >>
> > >> I read through it and have some thoughts,
> > >>
> > >> - Supposing at the beginning, client side cache for region
> > >> information
> > is
> > >> empty, and the client wants to GET row-key 123 from table ABC;
> > >> - The client will read from ROOT table at first. But unfortunately,
> > >> ROOT table only contains region information for META table (please
> > >> correct
> > me if
> > >> I am wrong), but not region information for real data table (e.g.
> > >> table ABC);
> > >> - Does the client have to call each META region server one by one,
> > >> in order to find which META region contains information for region
> > >> owner of row-key 123 of data table ABC?
> > >>
> > >> BTW: I think if there is a way to expose information about what
> > >> range of table/region each META region contains from .META. region
> > >> key, it will
> > be
> > >> better to save time to iterate META region server one by one.
> > >> Please
> > feel
> > >> free to correct me if I am wrong.
> > >>
> > >> regards,
> > >> Lin
> > >>
> > >>
> > >> On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil <
> > doug.meil@explorysmedical.com>wrote:
> > >>
> > >>>
> > >>> For further information about the catalog tables and
> > region-regionserver
> > >>> assignment, see thisŠ
> > >>>
> > >>> http://hbase.apache.org/book.html#arch.catalog
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On 8/19/12 7:36 AM, "Lin Ma" <linlma@gmail.com> wrote:
> > >>>
> > >>> >Thank you Stack, especially for the smart 6 round trip guess for
> > >>> >the puzzle. :-)
> > >>> >
> > >>> >1. "Yeah, we client cache's locations, not the data." -- does it
> > >>> >mean
> > for
> > >>> >each client, it will cache all location information of a HBase
> > cluster,
> > >>> >i.e. which physical server owns which region? Supposing each
> > >>> >region
> > has
> > >>> >128M bytes, for a big cluster (P-bytes level), total data size
/
> > >>> >128M
> > is
> > >>> >not a trivial number, not sure if any overhead to client?
> > >>> >2. A bit confused by what do you mean "not the data"? For the
> > >>> >client cached location information, it should be the data in
> > >>> >table METADATA, which
> > is
> > >>> >region / physical server mapping data. Why you say not data (do
> > >>> >you
> > mean
> > >>> >real content in each region)?
> > >>> >
> > >>> >regards,
> > >>> >Lin
> > >>> >
> > >>> >On Sun, Aug 19, 2012 at 12:40 PM, Stack <stack@duboce.net>
wrote:
> > >>> >
> > >>> >> On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma <linlma@gmail.com>
wrote:
> > >>> >> > Hello guys,
> > >>> >> >
> > >>> >> > I am referencing the Big Table paper about how a client
> > >>> >> > locates a
> > >>> >>tablet.
> > >>> >> > In section 5.1 Tablet location, it is mentioned that
client
> > >>> >> > will
> > >>> cache
> > >>> >> all
> > >>> >> > tablet locations, I think it means client will cache
root
> > >>> >> > tablet
> > in
> > >>> >> > METADATA table, and all other tablets in METADATA table
> > >>> >> > (which
> > means
> > >>> >> client
> > >>> >> > cache the whole METADATA table?). My question is, whether
> > >>> >> > HBase
> > >>> >> implements
> > >>> >> > in the same or similar way? My concern or confusion is,
> > >>> >> > supposing
> > >>> each
> > >>> >> > tablet or region file is 128M bytes, it will be very
huge
> > >>> >> > space
> > (i.e.
> > >>> >> > memory footprint) for each client to cache all tablets
or
> > >>> >> > region
> > >>> >>files of
> > >>> >> > METADATA table. Is it doable or feasible in real HBase
clusters?
> > >>> >>Thanks.
> > >>> >> >
> > >>> >>
> > >>> >> Yeah, we client cache's locations, not the data.
> > >>> >>
> > >>> >>
> > >>> >> > BTW: another confusion from me is in the paper of Big
Table
> > section
> > >>> >>5.1
> > >>> >> > Tablet location, it is mentioned that "If the client¹s
cache
> > >>> >> > is
> > >>> stale,
> > >>> >> the
> > >>> >> > location algorithm could take up to six round-trips,
because
> > >>> >> > stale
> > >>> >>cache
> > >>> >> > entries are only discovered upon misses (assuming that
> > >>> >> > METADATA
> > >>> >>tablets
> > >>> >> do
> > >>> >> > not move very frequently).", I do not know how the 6
times
> > >>> >> > round
> > trip
> > >>> >> time
> > >>> >> > is calculated, if anyone could answer this puzzle, it
will be
> > great.
> > >>> >>:-)
> > >>> >> >
> > >>> >>
> > >>> >> I'm not sure what the 6 is about either.  Here is a guesstimate:
> > >>> >>
> > >>> >> 1. Go to cached location for a server for a particular user
> > >>> >> region, but server says that it does not have a region, the
> > >>> >> client location
> > is
> > >>> >> stale
> > >>> >> 2. Go back to client cached meta region that holds user region
> > >>> >> w/
> > row
> > >>> >> we want, but its location is stale.
> > >>> >> 3. Go to root location, to find new location of meta, but
the
> > >>> >> root location has moved.... what the client has is stale 4.
> > >>> >> Find new root location and do lookup of meta region location
5.
> > >>> >> Go to meta region location to find new user region 6. Go to
> > >>> >> server w/ user region
> > >>> >>
> > >>> >> St.Ack
> > >>> >>
> > >>>
> > >>>
> > >>>
> > >>
> >
> >
> >
> > --
> > Harsh J
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message