hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: how client location a region/tablet?
Date Thu, 23 Aug 2012 15:48:38 GMT
HBase currently keeps a single META region (Doesn't split it). ROOT
holds META region location, and META has a few rows in it, a few of
them for each table. See also the class MetaScanner.

On Thu, Aug 23, 2012 at 9:00 PM, Lin Ma <linlma@gmail.com> wrote:
> Dong,
>
> Some more thoughts, after reading data structure for HRegionInfo =>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HRegionInfo.html,
> start key and end key looks informative which we could leverage,
>
> - I am not sure if we could leverage this information (stored as part of
> value in table ROOT) to find which META region may contains region server
> information for row-key 123 of data table ABC;
> - But I think unfortunately the information is stored in value of table
> ROOT, other than key field of table ROOT, so that we have to iterate each
> row in ROOT table one by one to figure out which META region server to
> access.
>
> Not sure if I get the points. Please feel free to correct me.
>
> regards,
> Lin
>
> On Thu, Aug 23, 2012 at 11:15 PM, Lin Ma <linlma@gmail.com> wrote:
>
>> Doug, very informative document. Thanks a lot!
>>
>> I read through it and have some thoughts,
>>
>> - Supposing at the beginning, client side cache for region information is
>> empty, and the client wants to GET row-key 123 from table ABC;
>> - The client will read from ROOT table at first. But unfortunately, ROOT
>> table only contains region information for META table (please correct me if
>> I am wrong), but not region information for real data table (e.g. table
>> ABC);
>> - Does the client have to call each META region server one by one, in
>> order to find which META region contains information for region owner of
>> row-key 123 of data table ABC?
>>
>> BTW: I think if there is a way to expose information about what range of
>> table/region each META region contains from .META. region key, it will be
>> better to save time to iterate META region server one by one. Please feel
>> free to correct me if I am wrong.
>>
>> regards,
>> Lin
>>
>>
>> On Thu, Aug 23, 2012 at 8:21 PM, Doug Meil <doug.meil@explorysmedical.com>wrote:
>>
>>>
>>> For further information about the catalog tables and region-regionserver
>>> assignment, see thisŠ
>>>
>>> http://hbase.apache.org/book.html#arch.catalog
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 8/19/12 7:36 AM, "Lin Ma" <linlma@gmail.com> wrote:
>>>
>>> >Thank you Stack, especially for the smart 6 round trip guess for the
>>> >puzzle. :-)
>>> >
>>> >1. "Yeah, we client cache's locations, not the data." -- does it mean for
>>> >each client, it will cache all location information of a HBase cluster,
>>> >i.e. which physical server owns which region? Supposing each region has
>>> >128M bytes, for a big cluster (P-bytes level), total data size / 128M is
>>> >not a trivial number, not sure if any overhead to client?
>>> >2. A bit confused by what do you mean "not the data"? For the client
>>> >cached
>>> >location information, it should be the data in table METADATA, which is
>>> >region / physical server mapping data. Why you say not data (do you mean
>>> >real content in each region)?
>>> >
>>> >regards,
>>> >Lin
>>> >
>>> >On Sun, Aug 19, 2012 at 12:40 PM, Stack <stack@duboce.net> wrote:
>>> >
>>> >> On Sat, Aug 18, 2012 at 2:13 AM, Lin Ma <linlma@gmail.com> wrote:
>>> >> > Hello guys,
>>> >> >
>>> >> > I am referencing the Big Table paper about how a client locates
a
>>> >>tablet.
>>> >> > In section 5.1 Tablet location, it is mentioned that client will
>>> cache
>>> >> all
>>> >> > tablet locations, I think it means client will cache root tablet
in
>>> >> > METADATA table, and all other tablets in METADATA table (which
means
>>> >> client
>>> >> > cache the whole METADATA table?). My question is, whether HBase
>>> >> implements
>>> >> > in the same or similar way? My concern or confusion is, supposing
>>> each
>>> >> > tablet or region file is 128M bytes, it will be very huge space
(i.e.
>>> >> > memory footprint) for each client to cache all tablets or region
>>> >>files of
>>> >> > METADATA table. Is it doable or feasible in real HBase clusters?
>>> >>Thanks.
>>> >> >
>>> >>
>>> >> Yeah, we client cache's locations, not the data.
>>> >>
>>> >>
>>> >> > BTW: another confusion from me is in the paper of Big Table section
>>> >>5.1
>>> >> > Tablet location, it is mentioned that "If the client¹s cache is
>>> stale,
>>> >> the
>>> >> > location algorithm could take up to six round-trips, because stale
>>> >>cache
>>> >> > entries are only discovered upon misses (assuming that METADATA
>>> >>tablets
>>> >> do
>>> >> > not move very frequently).", I do not know how the 6 times round
trip
>>> >> time
>>> >> > is calculated, if anyone could answer this puzzle, it will be great.
>>> >>:-)
>>> >> >
>>> >>
>>> >> I'm not sure what the 6 is about either.  Here is a guesstimate:
>>> >>
>>> >> 1. Go to cached location for a server for a particular user region,
>>> >> but server says that it does not have a region, the client location
is
>>> >> stale
>>> >> 2. Go back to client cached meta region that holds user region w/ row
>>> >> we want, but its location is stale.
>>> >> 3. Go to root location, to find new location of meta, but the root
>>> >> location has moved.... what the client has is stale
>>> >> 4. Find new root location and do lookup of meta region location
>>> >> 5. Go to meta region location to find new user region
>>> >> 6. Go to server w/ user region
>>> >>
>>> >> St.Ack
>>> >>
>>>
>>>
>>>
>>



-- 
Harsh J

Mime
View raw message