hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: client cache for all region server information?
Date Thu, 23 Aug 2012 15:37:28 GMT
Hi Lin,

On Thu, Aug 23, 2012 at 7:56 PM, Lin Ma <linlma@gmail.com> wrote:
> Harsh, thanks for the detailed information.
> Two more comments,
> 1. I want to confirm my understanding is correct. At the beginning client
> cache has nothing, when it issue request for a table, if the region server
> location is not known, it will request from root META region to get region
> server information step by step, then cache the region server information.
> If cache already contain the requested region information, it will use
> directly from cache. In this way, cache grows when cache miss for requested
> region information;

You have it correct now. Region locations are cached only if they are
not available. And they are cached on need-basis, not all at once.

> 2. "far outweighs the other items it caches (scan results, etc.)", you mean
> GET API of HBase cache results? Sorry I am not aware of this feature before.
> How the results are cached, and whether we can control it (supposing a
> client is doing random read pattern, we do not want to cache information
> since each read may be unique row-key access)? Appreciate if you could point
> me to some more detailed information.

Am speaking of Scanner value caching, not Gets exactly. See more about
Scanner (client) caching at

> regards,
> Lin
> On Thu, Aug 23, 2012 at 9:35 PM, Harsh J <harsh@cloudera.com> wrote:
>> Hi Lin,
>> On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma <linlma@gmail.com> wrote:
>> > Thank you Abhishek,
>> >
>> > Two more comments,
>> >
>> > -- "Client only caches information as needed for its queries and not
>> > necessarily for 'all' region servers." -- how did client know which
>> > region
>> > server information is necessary to be cached in current HBase
>> > implementation?
>> What Abhishek meant here is that it caches only the needed table's
>> rows from META. It also only caches the specific region required for
>> the row you're looking up/operating on, AFAICT.
>> > -- When the client loads region server information for the first time?
>> > Did
>> > client persistent cache information at client side about region server
>> > information?
>> The client loads up regionserver information for a table, when it is
>> requested to perform an operation on that table (on a specific row or
>> the whole). It does not immediately, upon initialization, cache the
>> whole of META's contents.
>> Your question makes sense though, that it does seem to be such that a
>> client *may* use quite a bit of memory space in trying to cache the
>> META entries locally, but practically we've not had this cause issues
>> for users yet. The amount of memory cached for META far outweighs the
>> other items it caches (scan results, etc.). At least I have not seen
>> any reports of excessive client memory usage just due to region
>> locations of tables being cached.
>> I think there's more benefits storing/caching it than not doing so,
>> and so far we've not needed the extra complexity of persisting the
>> cache to a local or non-RAM storage than keeping it in memory.
>> --
>> Harsh J

Harsh J

View raw message