hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Ma <lin...@gmail.com>
Subject Re: client cache for all region server information?
Date Tue, 28 Aug 2012 03:39:38 GMT
Thanks Harsh,

A two more comments / thoughts,

1. For mapper: mapper normally runs on the same regional server which owns
the row-key range for the mapper input because of locality reasons (I am
not 100% confident whether it is always true mapper always runs on the same
region server, please feel free to correct me if I am wrong) -- so it is
already local I/O, is there big benefit to return 500 at one time? Could
you show me an example when there is big benefit?
2. For reducer: we could also use Scan object, and it works in the same way
of Mapper? I have this confusion since normally reducer writes to HBase,
could you show me an example when we need to read HBase in Reducer by using
Scan?
3. What means RS in your reply?
4. For non map-reduce job (e.g. when using HBase GET API directly), any
kinds of similar batch function which HBase provides or 3rd party provides?

regards,
Lin

On Mon, Aug 27, 2012 at 11:55 PM, Harsh J <harsh@cloudera.com> wrote:

> Not necessarily consecutive, unless the request itself is so. It only
> returns 500 rows that match the user's request.
>
> User's request of a specific row-range and filters are usually
> embedded into the Scan object, sent to the RS. Whatever is accumulated
> as the result of the Scan operation (server-side) is accumulated in
> sizes of 500 rows and returned in one Scanner.next() call from the
> client.
>
> Does this clear it up Lin?
>
> On Mon, Aug 27, 2012 at 8:40 PM, Lin Ma <linlma@gmail.com> wrote:
> > Hi Harsh,
> >
> > I read through the document you referred, for the below comment, I am
> > confused. Major confusion is, does it mean HBase will transfer
> consecutive
> > 500 rows to client (supposing client mapper want row with row-key 100,
> Hbase
> > will return row-key from 100 to 600 at one time to client, similar to
> batch
> > read?), how to ensure such 500 rows are all desired input for client
> mapper
> > job (e.g. how do HBase know client mapper job wants row-key from 101 to
> > 600)?
> >
> > "Using the default value means that the map-task will make call back to
> the
> > region-server for every record processed. Setting this value to 500, for
> > example, will transfer 500 rows at a time to the client to be processed."
> >
> > regards,
> > Lin
> >
> >
> > On Thu, Aug 23, 2012 at 11:37 PM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >> Hi Lin,
> >>
> >> On Thu, Aug 23, 2012 at 7:56 PM, Lin Ma <linlma@gmail.com> wrote:
> >> > Harsh, thanks for the detailed information.
> >> >
> >> > Two more comments,
> >> >
> >> > 1. I want to confirm my understanding is correct. At the beginning
> >> > client
> >> > cache has nothing, when it issue request for a table, if the region
> >> > server
> >> > location is not known, it will request from root META region to get
> >> > region
> >> > server information step by step, then cache the region server
> >> > information.
> >> > If cache already contain the requested region information, it will use
> >> > directly from cache. In this way, cache grows when cache miss for
> >> > requested
> >> > region information;
> >>
> >> You have it correct now. Region locations are cached only if they are
> >> not available. And they are cached on need-basis, not all at once.
> >>
> >> > 2. "far outweighs the other items it caches (scan results, etc.)", you
> >> > mean
> >> > GET API of HBase cache results? Sorry I am not aware of this feature
> >> > before.
> >> > How the results are cached, and whether we can control it (supposing a
> >> > client is doing random read pattern, we do not want to cache
> information
> >> > since each read may be unique row-key access)? Appreciate if you could
> >> > point
> >> > me to some more detailed information.
> >>
> >> Am speaking of Scanner value caching, not Gets exactly. See more about
> >> Scanner (client) caching at
> >> http://hbase.apache.org/book.html#perf.hbase.client.caching
> >>
> >> > regards,
> >> > Lin
> >> >
> >> >
> >> > On Thu, Aug 23, 2012 at 9:35 PM, Harsh J <harsh@cloudera.com> wrote:
> >> >>
> >> >> Hi Lin,
> >> >>
> >> >> On Thu, Aug 23, 2012 at 4:31 PM, Lin Ma <linlma@gmail.com> wrote:
> >> >> > Thank you Abhishek,
> >> >> >
> >> >> > Two more comments,
> >> >> >
> >> >> > -- "Client only caches information as needed for its queries and
> not
> >> >> > necessarily for 'all' region servers." -- how did client know
which
> >> >> > region
> >> >> > server information is necessary to be cached in current HBase
> >> >> > implementation?
> >> >>
> >> >> What Abhishek meant here is that it caches only the needed table's
> >> >> rows from META. It also only caches the specific region required for
> >> >> the row you're looking up/operating on, AFAICT.
> >> >>
> >> >> > -- When the client loads region server information for the first
> >> >> > time?
> >> >> > Did
> >> >> > client persistent cache information at client side about region
> >> >> > server
> >> >> > information?
> >> >>
> >> >> The client loads up regionserver information for a table, when it is
> >> >> requested to perform an operation on that table (on a specific row
or
> >> >> the whole). It does not immediately, upon initialization, cache the
> >> >> whole of META's contents.
> >> >>
> >> >> Your question makes sense though, that it does seem to be such that
a
> >> >> client *may* use quite a bit of memory space in trying to cache the
> >> >> META entries locally, but practically we've not had this cause issues
> >> >> for users yet. The amount of memory cached for META far outweighs the
> >> >> other items it caches (scan results, etc.). At least I have not seen
> >> >> any reports of excessive client memory usage just due to region
> >> >> locations of tables being cached.
> >> >>
> >> >> I think there's more benefits storing/caching it than not doing so,
> >> >> and so far we've not needed the extra complexity of persisting the
> >> >> cache to a local or non-RAM storage than keeping it in memory.
> >> >>
> >> >> --
> >> >> Harsh J
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message