hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Improving hbase read performance
Date Wed, 18 Feb 2009 16:09:26 GMT
On Wed, Feb 18, 2009 at 2:23 AM, shourabh rawat <mirage1987@gmail.com>wrote:

> hey,
> "> What do you mean by the above when you say read sequentially? Are you
> > scanning? (Getting a scanner and then nexting through your hbase
> table?)."
> well lets say i have 10 keys that are stored in hbase
> i want to retrive them
> If I do the reads one by one the time would be summation of  'get'
> times of each key
> Could i do the same thing in parallel. so that all the get's cld occur
> concurrently so i would get total time as the max of the time taken by
> any of these keys rather than the summ of individual times

Yes.  Do multple instances of HTable.  You won't do the ten requests in the
time it would take to do one.  It'll be more like the time to do 2 or 3 (at
least in my primitive testing).  If you had more regionservers, it would
complete in shorter time (its the single Connection issue you mentioned in
an earlier mail).

> > You will have to wait for hbase 0.20.0 or do as Erik suggests and put a
> > cache in front of hbase.  What are you trying to do with hbase?  Serve a
> > website? "
> ya sort of but i want to check performance withought the use of cache
> (random reads) ....can i get such performance in the range of 10 ms
> with hbase

Depends on hardware, data, etc (See the wiki for the numbers I get with our
hardware and loading).

If this is important to you, you might wait on hbase 0.20.0.  Improving this
performance dimension is its focus.

> so by a single connection u mean all the gets wld be treated
> sequentially (one by one) by the hbase even wen the requests come in
> parallel(even wen different htable instances for the same table are
> employed)...

It does not do a request, wait for the response and then return the
response.  It interleaves the sending of requests and responses so you'll
see something like this:


This is how the hadoop RPC works.  Its what we currently use.

You could also run multiple clients each to their own process so each
process got its own Connection instance.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message