hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pamecha, Abhishek" <apame...@x.com>
Subject RE: how client location a region/tablet?
Date Thu, 23 Aug 2012 17:06:19 GMT
I too thought there are multiple meta regions where as just one ROOT.  May be I am mixing b/w
Big Table and Hbase.

Thanks,
Abhishek


-----Original Message-----
From: Lin Ma [mailto:linlma@gmail.com] 
Sent: Thursday, August 23, 2012 9:41 AM
To: user@hbase.apache.org; harsh@cloudera.com
Cc: doug.meil@explorysmedical.com
Subject: Re: how client location a region/tablet?

Thanks, Harsh!

- "HBase currently keeps a single META region (Doesn't split it). " -- does it mean there
is only one row in ROOT table, which points the only one META region?
- In Big Table, it seems they have multiple META regions (tablets), is it an advantage over
HBase? :-)

regards,
Lin
On Thu, Aug 23, 2012 at 11:48 PM, Harsh J <harsh@cloudera.com> wrote:

> HBase currently keeps a single META region (Doesn't split it). ROOT 
> holds META region location, and META has a few rows in it, a few of 
> them for each table. See also the class MetaScanner.
>
> On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma <linlma@gmail.com> wrote:
> > Dong,
> >
> > Some more thoughts, after reading data structure for HRegionInfo => 
> > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.
> > html
> ,
> > start key and end key looks informative which we could leverage,
> >
> > - I am not sure if we could leverage this information (stored as 
> > part of value in table ROOT) to find which META region may contains 
> > region server information for row-key 123 of data table ABC;
> > - But I think unfortunately the information is stored in value of 
> > table ROOT, other than key field of table ROOT, so that we have to 
> > iterate each row in ROOT table one by one to figure out which META 
> > region server to access.
> >
> > Not sure if I get the points. Please feel free to correct me.
> >
> > regards,
> > Lin
> >
> > On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma <linlma@gmail.com> wrote:
> >
> >> Doug, very informative document. Thanks a lot!
> >>
> >> I read through it and have some thoughts,
> >>
> >> - Supposing at the beginning, client side cache for region 
> >> information
> is
> >> empty, and the client wants to GET row-key 123 from table ABC;
> >> - The client will read from ROOT table at first. But unfortunately, 
> >> ROOT table only contains region information for META table (please 
> >> correct
> me if
> >> I am wrong), but not region information for real data table (e.g. 
> >> table ABC);
> >> - Does the client have to call each META region server one by one, 
> >> in order to find which META region contains information for region 
> >> owner of row-key 123 of data table ABC?
> >>
> >> BTW: I think if there is a way to expose information about what 
> >> range of table/region each META region contains from .META. region 
> >> key, it will
> be
> >> better to save time to iterate META region server one by one. 
> >> Please
> feel
> >> free to correct me if I am wrong.
> >>
> >> regards,
> >> Lin
> >>
> >>
> >> On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil <
> doug.meil@explorysmedical.com>wrote:
> >>
> >>>
> >>> For further information about the catalog tables and
> region-regionserver
> >>> assignment, see thisŠ
> >>>
> >>> http://hbase.apache.org/book.html#arch.catalog
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 8/19/12 7:36 AM, "Lin Ma" <linlma@gmail.com> wrote:
> >>>
> >>> >Thank you Stack, especially for the smart 6 round trip guess for 
> >>> >the puzzle. :-)
> >>> >
> >>> >1. "Yeah, we client cache's locations, not the data." -- does it 
> >>> >mean
> for
> >>> >each client, it will cache all location information of a HBase
> cluster,
> >>> >i.e. which physical server owns which region? Supposing each 
> >>> >region
> has
> >>> >128M bytes, for a big cluster (P-bytes level), total data size / 
> >>> >128M
> is
> >>> >not a trivial number, not sure if any overhead to client?
> >>> >2. A bit confused by what do you mean "not the data"? For the 
> >>> >client cached location information, it should be the data in 
> >>> >table METADATA, which
> is
> >>> >region / physical server mapping data. Why you say not data (do 
> >>> >you
> mean
> >>> >real content in each region)?
> >>> >
> >>> >regards,
> >>> >Lin
> >>> >
> >>> >On Sun, Aug 19, 2012 at 12:40 PM, Stack <stack@duboce.net> wrote:
> >>> >
> >>> >> On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma <linlma@gmail.com>
wrote:
> >>> >> > Hello guys,
> >>> >> >
> >>> >> > I am referencing the Big Table paper about how a client 
> >>> >> > locates a
> >>> >>tablet.
> >>> >> > In section 5.1 Tablet location, it is mentioned that client

> >>> >> > will
> >>> cache
> >>> >> all
> >>> >> > tablet locations, I think it means client will cache root

> >>> >> > tablet
> in
> >>> >> > METADATA table, and all other tablets in METADATA table 
> >>> >> > (which
> means
> >>> >> client
> >>> >> > cache the whole METADATA table?). My question is, whether

> >>> >> > HBase
> >>> >> implements
> >>> >> > in the same or similar way? My concern or confusion is, 
> >>> >> > supposing
> >>> each
> >>> >> > tablet or region file is 128M bytes, it will be very huge

> >>> >> > space
> (i.e.
> >>> >> > memory footprint) for each client to cache all tablets or

> >>> >> > region
> >>> >>files of
> >>> >> > METADATA table. Is it doable or feasible in real HBase clusters?
> >>> >>Thanks.
> >>> >> >
> >>> >>
> >>> >> Yeah, we client cache's locations, not the data.
> >>> >>
> >>> >>
> >>> >> > BTW: another confusion from me is in the paper of Big Table
> section
> >>> >>5.1
> >>> >> > Tablet location, it is mentioned that "If the client¹s cache

> >>> >> > is
> >>> stale,
> >>> >> the
> >>> >> > location algorithm could take up to six round-trips, because

> >>> >> > stale
> >>> >>cache
> >>> >> > entries are only discovered upon misses (assuming that 
> >>> >> > METADATA
> >>> >>tablets
> >>> >> do
> >>> >> > not move very frequently).", I do not know how the 6 times

> >>> >> > round
> trip
> >>> >> time
> >>> >> > is calculated, if anyone could answer this puzzle, it will
be
> great.
> >>> >>:-)
> >>> >> >
> >>> >>
> >>> >> I'm not sure what the 6 is about either.  Here is a guesstimate:
> >>> >>
> >>> >> 1. Go to cached location for a server for a particular user 
> >>> >> region, but server says that it does not have a region, the 
> >>> >> client location
> is
> >>> >> stale
> >>> >> 2. Go back to client cached meta region that holds user region

> >>> >> w/
> row
> >>> >> we want, but its location is stale.
> >>> >> 3. Go to root location, to find new location of meta, but the 
> >>> >> root location has moved.... what the client has is stale 4. 
> >>> >> Find new root location and do lookup of meta region location 5.

> >>> >> Go to meta region location to find new user region 6. Go to 
> >>> >> server w/ user region
> >>> >>
> >>> >> St.Ack
> >>> >>
> >>>
> >>>
> >>>
> >>
>
>
>
> --
> Harsh J
>
Mime
View raw message