hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: try to run PerformanceEvaluation and encounter RetriesExhaustedException
Date Fri, 03 Apr 2009 15:10:21 GMT
Thanks for the detailed description of your experiences.

On Thu, Apr 2, 2009 at 10:03 AM, Jun Li <jltz922181@gmail.com> wrote:

> ...
> (1)   I first changed HBASE_HEAPSIZE defined in hbase-env.sh from 1 GB to 2
> GB, and run: bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation
> sequentialWrite 4.  It fails at the map phase of M/R, due to the
> RetriesExhausedException, same as what I reported before.

Previous you ran into OOME.  Now you see RetriesExhaustedException?  Does
the exception show as an error  or just in DEBUG level logging?  Probably
the former.  You might try upping the regionserver lease from 60 seconds to
120 or 180 seconds.

> So based on my experiments, it seems that by changing the heap size from 1
> GB to 2GB and modifying io.map.index.skip to 128, there is not much
> observable help to resolve the “RetriesExhausedException”. By having the
> low
> number of rows (from 1 M rows to 10240 rows), the exception disappears,
> implies that the number of concurrent clients, and thus the number of the
> connections to the servers, is not the root cause of the problem that I
> have. The root cause should likely be related to the size of the rows, in
> this particular example. And thus, I did not try to change the setting of
> “mapred.map.tasks”, which has the default setting of 2 already.

You might check the regionserver that exhausted the replies.  Check its
logs.  Look particularly at compactions -- are they keeping up with the
upload or are the number of files being compacted continually rising?  Maybe
the host was struggling with compact load.  Are region splits delayed?   For
example, after your upload, did the number of regions rise substantially?
If you run the shell, can you get the row that we exhausted retries on?

My objective is to see how well HBase can support for concurrent clients,
> with modest row number at this time, say 1 M rows. Could you provide other
> suggestions, or it is in the road map for future release to fix the related
> problem? If you like to see detailed logs to infer the root cause, I would
> be happy to do that.

The Retries Exhausted does seem to be rearing its ugly head of late during
bulk uploads in 0.19.1.  Somethings up, some kind of temporary lock-up I'm
guessing since we retry ten times IIRC with a backoff between each try.
Some one of us needs to dig in and figure whats going on.  Sorry for the
trouble its given you.  As I've said in earlier mail, I seemed to have
better luck than you (At the end of this page, I've started to record
numbers for 8 concurrent clients doing reads, writes and scans:

As to concurrent clients, we gate how many can come in by the number of RPC
listeners we put up.  You might experiment upping (or lowering) the number.

HBase 0.20.0 will be very different to 0.19.0 in character.  It should be
more 'live' able -- there are less synchronizations -- and it should be more


> On Tue, Mar 31, 2009 at 3:35 PM, stack <stack@duboce.net> wrote:
> > On Wed, Apr 1, 2009 at 12:29 AM, stack <stack@duboce.net> wrote:
> >
> > > On Tue, Mar 31, 2009 at 8:08 AM, Jun Li <jltz922181@gmail.com> wrote:
> > >
> > > I was using defaults.  Maybe my hardware is better than yours.  Tell us
> > > about yours (RAM)?  I suggested io.map.index.skip because you were
> > OOME'ing
> > > and thats the thing that most directly effects memory use.
> > >
> >
> > You could try upping your regionserver heap if you have enough RAM.  Try
> > setting $HBASE_HOME/conf/hbase-env.sh HBASE_HEAPSIZE to 2G.
> > St.Ack
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message