hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerry He <jerry...@gmail.com>
Subject Re: hbase client RetriesExhaustedWithDetailsException with RegionTooBusyException
Date Wed, 05 Feb 2014 04:53:58 GMT
Thanks for the input.

One thing I am trying to understand is the re-try attempts by the HBase
client.  I would think it would help to overcome to some degree a 'busy
region server'.
In the above stack trace, the client attempted #14/35 and then NOT
resubmitting.  Can anyone familiar with that part of the code share some
understanding?


On Tue, Feb 4, 2014 at 6:38 PM, Vladimir Rodionov
<vrodionov@carrieriq.com>wrote:

> Forgot to add ..
>
> * Throttle insertion rate.
>
> If you have 100 simultaneous task - set 100 -200 inserts per sec per task
> for starter
> (depends on a cluster power, of course).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Vladimir Rodionov [vladrodionov@gmail.com]
> Sent: Tuesday, February 04, 2014 6:36 PM
> To: dev@hbase.apache.org
> Subject: Re: hbase client RetriesExhaustedWithDetailsException with
> RegionTooBusyException
>
> Busy - means busy.
> Doc says:
> Thrown by a region server if it will block and wait to serve a request. For
> example, the client wants to insert something to a region while the
>  region is compacting.
>
> >> I have a MR job that is loading data into hbase using hbase client API.
> You should avoid this. If you can't :
>
> * Reduce # of M/R tasks.
> * Disable major compaction
> * Disable region splitting
> * Increase max region size.
> * Add retry logic to the client (M/R task).
>
> Check bulk loading.
>
>
>
> On Tue, Feb 4, 2014 at 5:44 PM, Jerry He <jerryjch@gmail.com> wrote:
>
> > Hi, hbase experts:
> >
> > You probably have seen this in the past.  Can someone share some quick
> > ideas?
> > I have a MR job that is loading data into hbase using hbase client API.
> >  The job failed after a while.  Below is the error info and stack trace.
> >
> > 2014-02-04 14:15:32,587 WARN org.apache.hadoop.hbase.client.AsyncProcess:
> > Attempt #6/35 failed for 182 ops on hdtest203.svl.ibm.com
> > ,60020,1391489734651
> > NOT resubmitting.
> >
> >
> region=tpch_hb_1000.lineitem,\x01\x8Ao\x83\xF0\x01\x80'`\x04\x01\x80\x00\x00\x00\x02ufc\x01\x80\x00\x00\x01,1391544629530.b4a41acd34723629c417571b524b80ab.,
> > hostname=hdtest203.svl.ibm.com,60020,1391489734651, seqNum=682710
> > 2014-02-04 14:19:12,987 INFO
> org.apache.hadoop.hbase.client.AsyncProcess: :
> > Waiting for the global number of running tasks to be equals or less than
> 0,
> > tasksSent=302, tasksDone=301, currentTasksDone=301,
> > tableName=tpch_hb_1000.lineitem
> > 2014-02-04 14:19:32,452 INFO
> org.apache.hadoop.hbase.client.AsyncProcess: :
> > Waiting for the global number of running tasks to be equals or less than
> 0,
> > tasksSent=306, tasksDone=305, currentTasksDone=305,
> > tableName=tpch_hb_1000.lineitem
> > 2014-02-04 14:19:42,544 INFO
> org.apache.hadoop.hbase.client.AsyncProcess: :
> > Waiting for the global number of running tasks to be equals or less than
> 0,
> > tasksSent=307, tasksDone=306, currentTasksDone=306,
> > tableName=tpch_hb_1000.lineitem
> > 2014-02-04 14:19:52,624 INFO
> org.apache.hadoop.hbase.client.AsyncProcess: :
> > Waiting for the global number of running tasks to be equals or less than
> 0,
> > tasksSent=308, tasksDone=307, currentTasksDone=307,
> > tableName=tpch_hb_1000.lineitem
> > 2014-02-04 14:20:03,044 INFO
> org.apache.hadoop.hbase.client.AsyncProcess: :
> > Waiting for the global number of running tasks to be equals or less than
> 0,
> > tasksSent=309, tasksDone=308, currentTasksDone=308,
> > tableName=tpch_hb_1000.lineitem
> > 2014-02-04 14:20:13,133 INFO
> org.apache.hadoop.hbase.client.AsyncProcess: :
> > Waiting for the global number of running tasks to be equals or less than
> 0,
> > tasksSent=310, tasksDone=309, currentTasksDone=309,
> > tableName=tpch_hb_1000.lineitem
> > 2014-02-04 14:20:23,240 INFO
> org.apache.hadoop.hbase.client.AsyncProcess: :
> > Waiting for the global number of running tasks to be equals or less than
> 0,
> > tasksSent=311, tasksDone=310, currentTasksDone=310,
> > tableName=tpch_hb_1000.lineitem
> > 2014-02-04 14:20:33,328 INFO
> org.apache.hadoop.hbase.client.AsyncProcess: :
> > Waiting for the global number of running tasks to be equals or less than
> 0,
> > tasksSent=312, tasksDone=311, currentTasksDone=311,
> > tableName=tpch_hb_1000.lineitem
> > 2014-02-04 14:20:38,444 WARN org.apache.hadoop.hbase.client.AsyncProcess:
> > Attempt #14/35 failed for 837 ops on hdtest203.svl.ibm.com
> > ,60020,1391489734651
> > NOT resubmitting.
> >
> >
> region=tpch_hb_1000.lineitem,\x01\x8Ao\x83\xF0\x01\x80'`\x04\x01\x80\x00\x00\x00\x02ufc\x01\x80\x00\x00\x01,1391544629530.b4a41acd34723629c417571b524b80ab.,
> > hostname=hdtest203.svl.ibm.com,60020,1391489734651, seqNum=682710
> > 2014-02-04 14:20:38,452 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
> > Initializing logs' truncater with mapRetainSize=-1 and
> reduceRetainSize=-1
> > 2014-02-04 14:20:38,472 ERROR
> > org.apache.hadoop.security.UserGroupInformation:
> PriviledgedActionException
> > as:hive (auth:SIMPLE)
> >
> cause:org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > Failed 837 actions: RegionTooBusyException: 837 times,
> > 2014-02-04 14:20:38,473 WARN org.apache.hadoop.mapred.Child: Error
> running
> > child
> > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed
> > 837 actions: RegionTooBusyException: 837 times,
> >         at
> >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:185)
> >         at
> >
> >
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:169)
> >         at
> >
> >
> org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:782)
> >         at
> >
> >
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:934)
> >         at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1193)
> >         at
> >
> >
> com.ibm.jaql.module.hbase.HBaseRecordWriter.close(HBaseRecordWriter.java:42)
> >         at
> >
> >
> com.ibm.jaql.io.hadoop.CompositeOutputAdapter$1.close(CompositeOutputAdapter.java:383)
> >         at
> >
> >
> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:806)
> >         at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:439)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >         at
> > java.security.AccessController.doPrivileged(AccessController.java:310)
> >         at javax.security.auth.Subject.doAs(Subject.java:573)
> >         at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502)
> >         at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > 2014-02-04 14:20:38,477 INFO org.apache.hadoop.mapred.Task: Runnning
> > cleanup for the task
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message