hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Jeltema <brian.jelt...@digitalenvoy.net>
Subject Re: what can cause RegionTooBusyException?
Date Tue, 11 Nov 2014 17:21:54 GMT
Thanks. I appear to have resolved this problem by restarting the HBase Master and the RegionServers
that were reporting the failure.

Brian

On Nov 11, 2014, at 12:13 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> For your first question, region server web UI,
> rs-status#regionRequestStats, shows Write Request Count.
> 
> You can monitor the value for the underlying region to see if it receives
> above-normal writes.
> 
> Cheers
> 
> On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema <bdjeltema@gmail.com> wrote:
> 
>>> Was the region containing this row hot around the time of failure ?
>> 
>> How do I measure that?
>> 
>>> 
>>> Can you check region server log (along with monitoring tool) what
>> memstore pressure was ?
>> 
>> I didn't see anything in the region server logs to indicate a problem. And
>> given the
>> reproducibility of the behavior, it's hard to see how dynamic parameters
>> such as
>> memory pressure could be at the root of the problem.
>> 
>> Brian
>> 
>> On Nov 10, 2014, at 3:22 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> 
>>> Was the region containing this row hot around the time of failure ?
>>> 
>>> Can you check region server log (along with monitoring tool) what
>> memstore pressure was ?
>>> 
>>> Thanks
>>> 
>>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
>> brian.jeltema@digitalenvoy.net> wrote:
>>> 
>>>>> How many tasks may write to this row concurrently ?
>>>> 
>>>> only 1 mapper should be writing to this row. Is there a way to check
>> which
>>>> locks are being held?
>>>> 
>>>>> Which 0.98 release are you using ?
>>>> 
>>>> 0.98.0.2.1.2.1-471-hadoop2
>>>> 
>>>> Thanks
>>>> Brian
>>>> 
>>>> On Nov 10, 2014, at 2:21 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>> 
>>>>> There could be more than one reason where RegionTooBusyException is
>> thrown.
>>>>> Below are two (from HRegion):
>>>>> 
>>>>> * We throw RegionTooBusyException if above memstore limit
>>>>> * and expect client to retry using some kind of backoff
>>>>> */
>>>>> private void checkResources()
>>>>> 
>>>>> * Try to acquire a lock.  Throw RegionTooBusyException
>>>>> 
>>>>> * if failed to get the lock in time. Throw InterruptedIOException
>>>>> 
>>>>> * if interrupted while waiting for the lock.
>>>>> 
>>>>> */
>>>>> 
>>>>> private void lock(final Lock lock, final int multiplier)
>>>>> 
>>>>> How many tasks may write to this row concurrently ?
>>>>> 
>>>>> Which 0.98 release are you using ?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
>>>>> brian.jeltema@digitalenvoy.net> wrote:
>>>>> 
>>>>>> I’m running a map/reduce job against a table that is performing
a
>> large
>>>>>> number of writes (probably updating every row).
>>>>>> The job is failing with the exception below. This is a solid failure;
>> it
>>>>>> dies at the same point in the application,
>>>>>> and at the same row in the table. So I doubt it’s a conflict with
>>>>>> compaction (and the UI shows no compaction in progress),
>>>>>> or that there is a load-related cause.
>>>>>> 
>>>>>> ‘hbase hbck’ does not report any inconsistencies. The
>>>>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>>>>>> there is operation in progress that is hung and blocking the update.
I
>>>>>> don’t see anything suspicious in the HBase logs.
>>>>>> The data at the point of failure is not unusual, and is identical
to
>> many
>>>>>> preceding rows.
>>>>>> Does anybody have any ideas of what I should look for to find the
>> cause of
>>>>>> this RegionTooBusyException?
>>>>>> 
>>>>>> This is Hadoop 2.4 and HBase 0.98.
>>>>>> 
>>>>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>>>>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
>>>>>> Error:
>>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> Failed
>>>>>> 1744 actions: RegionTooBusyException: 1744 times,
>>>>>>     at
>>>>>> 
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>>>>>>     at
>>>>>> 
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>>>>>>     at
>>>>>> 
>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>>>>>>     at
>>>>>> 
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>>>>>>     at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>>>>>>     at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>>>>>> 
>>>>>> Brian
>>>> 
>>> 
>> 
>> 


Mime
View raw message