hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Improving hbase read performance
Date Wed, 18 Feb 2009 18:31:11 GMT
HBase manages to which regionserver a query goes.  The client figures where
the row you are querying is hosted -- caching its knowledge of cluster
geography -- and sends the request to the hosting regionserver.

With a small cluster like yours, a threaded client where each thread does
lots of getting will give you better performance.  There is a relatively
large setup cost per task in MR so it'd probably run slower (MR would be
good for farming the requests out over the cluster and for ensuring they
complete).  For examples, see under src/example/mapred and study the
org.apache.hadoop.hbase.mapred package content.

No, hbase does not use MR as part of normal running.

St.Ack




On Wed, Feb 18, 2009 at 10:19 AM, shourabh rawat <mirage1987@gmail.com>wrote:

> hey few questions come to my mind,
>
> Can i send individual requests to each region server....If yes how
> how does the hbase handle my requests....does the hbase master
> distributes the requests among regionservers and do they process them
> in parallel....
> can i use a map/reduce to improve my read performance...and how.(each
> map wld be a get to the hbase and will each map run on a different
> hbase server)....
> does the hbase internally uses map/reduce for handling get request????
>
>
>
> On Wed, Feb 18, 2009 at 6:23 PM, stack <stack@duboce.net> wrote:
> > On Wed, Feb 18, 2009 at 8:39 AM, shourabh rawat <mirage1987@gmail.com
> >wrote:
> >
> >> Sorry to bug u again
> >
> >
> > Its no trouble. Lets figure it out.
> >
> >
> >
> >> well i pasted my code a few posts back...Is it the same as wat u r
> sayin...
> >>
> >
> > Pardon, I only just saw it.
> >
> > Looks like you are setting up a thread pool of 50 threads and then each
> time
> > the thread runs, it gets one value only?  Each thread makes its own
> HTable
> > instance?
> >
> > Set up a pool of 10 threads and have them each get 1000 values and see
> what
> > your numbers are like?  Or run ten processes each fetching 1000 values.
> >
> > I say 10 because with 50, the single Connection is probably a bottleneck.
>  I
> > also say 1000 so the cost of thread setup is amorticized..
> >
> > 0.20.0 hopefully will be out in a month or two.  There is still a bunch
> of
> > work to be done.
> >
> >
> > "You could also run multiple clients each to their own process so each
> > process got its own Connection instance."
> >
> > Didn't get wat u mean by  this...
> >> Well is it possible to get multiple connection instances. Isn't that
> >> the property of the HTables and with same name they alwyas have the
> >> same connection instances.
> >> Could you give some sample code which cld help me on this "multiple
> >> connection instances"
> >
> >
> > I was suggesting that you invoke your client program ten times,
> > concurrently: e.g for i in "1..10"; do java YOURPROGRAM &; done
> (something
> > like that).  You'd need to let it run longer so cost of jvm setup would
> wash
> > out.
> >
> > St.Ack
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message