hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qiang Tian <tian...@gmail.com>
Subject Re: what can cause RegionTooBusyException?
Date Wed, 12 Nov 2014 02:26:44 GMT
the checkResource Ted mentioned is a good suspect. see online hbase book
"9.7.7.7.1.1. Being Stuck".
Did you see below message in your RS log?
        LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime) +
          "ms on a compaction to clean up 'too many store files'; waited " +
          "long enough... proceeding with flush of " +
          region.getRegionNameAsString());


I did a quick test setting "hbase.hregion.memstore.block.multiplier" = 0,
issuing a put in hbase shell will trigger flush and throw region too busy
exception to client,  and the retry mechanism will make it done in next
multi RPC call.



On Wed, Nov 12, 2014 at 1:21 AM, Brian Jeltema <
brian.jeltema@digitalenvoy.net> wrote:

> Thanks. I appear to have resolved this problem by restarting the HBase
> Master and the RegionServers
> that were reporting the failure.
>
> Brian
>
> On Nov 11, 2014, at 12:13 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > For your first question, region server web UI,
> > rs-status#regionRequestStats, shows Write Request Count.
> >
> > You can monitor the value for the underlying region to see if it receives
> > above-normal writes.
> >
> > Cheers
> >
> > On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema <bdjeltema@gmail.com>
> wrote:
> >
> >>> Was the region containing this row hot around the time of failure ?
> >>
> >> How do I measure that?
> >>
> >>>
> >>> Can you check region server log (along with monitoring tool) what
> >> memstore pressure was ?
> >>
> >> I didn't see anything in the region server logs to indicate a problem.
> And
> >> given the
> >> reproducibility of the behavior, it's hard to see how dynamic parameters
> >> such as
> >> memory pressure could be at the root of the problem.
> >>
> >> Brian
> >>
> >> On Nov 10, 2014, at 3:22 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>
> >>> Was the region containing this row hot around the time of failure ?
> >>>
> >>> Can you check region server log (along with monitoring tool) what
> >> memstore pressure was ?
> >>>
> >>> Thanks
> >>>
> >>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
> >> brian.jeltema@digitalenvoy.net> wrote:
> >>>
> >>>>> How many tasks may write to this row concurrently ?
> >>>>
> >>>> only 1 mapper should be writing to this row. Is there a way to check
> >> which
> >>>> locks are being held?
> >>>>
> >>>>> Which 0.98 release are you using ?
> >>>>
> >>>> 0.98.0.2.1.2.1-471-hadoop2
> >>>>
> >>>> Thanks
> >>>> Brian
> >>>>
> >>>> On Nov 10, 2014, at 2:21 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >>>>
> >>>>> There could be more than one reason where RegionTooBusyException
is
> >> thrown.
> >>>>> Below are two (from HRegion):
> >>>>>
> >>>>> * We throw RegionTooBusyException if above memstore limit
> >>>>> * and expect client to retry using some kind of backoff
> >>>>> */
> >>>>> private void checkResources()
> >>>>>
> >>>>> * Try to acquire a lock.  Throw RegionTooBusyException
> >>>>>
> >>>>> * if failed to get the lock in time. Throw InterruptedIOException
> >>>>>
> >>>>> * if interrupted while waiting for the lock.
> >>>>>
> >>>>> */
> >>>>>
> >>>>> private void lock(final Lock lock, final int multiplier)
> >>>>>
> >>>>> How many tasks may write to this row concurrently ?
> >>>>>
> >>>>> Which 0.98 release are you using ?
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
> >>>>> brian.jeltema@digitalenvoy.net> wrote:
> >>>>>
> >>>>>> I’m running a map/reduce job against a table that is performing
a
> >> large
> >>>>>> number of writes (probably updating every row).
> >>>>>> The job is failing with the exception below. This is a solid
> failure;
> >> it
> >>>>>> dies at the same point in the application,
> >>>>>> and at the same row in the table. So I doubt it’s a conflict
with
> >>>>>> compaction (and the UI shows no compaction in progress),
> >>>>>> or that there is a load-related cause.
> >>>>>>
> >>>>>> ‘hbase hbck’ does not report any inconsistencies. The
> >>>>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
> >>>>>> there is operation in progress that is hung and blocking the
> update. I
> >>>>>> don’t see anything suspicious in the HBase logs.
> >>>>>> The data at the point of failure is not unusual, and is identical
to
> >> many
> >>>>>> preceding rows.
> >>>>>> Does anybody have any ideas of what I should look for to find
the
> >> cause of
> >>>>>> this RegionTooBusyException?
> >>>>>>
> >>>>>> This is Hadoop 2.4 and HBase 0.98.
> >>>>>>
> >>>>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
> >>>>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
> >>>>>> Error:
> >>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> >> Failed
> >>>>>> 1744 actions: RegionTooBusyException: 1744 times,
> >>>>>>     at
> >>>>>>
> >>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
> >>>>>>     at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
> >>>>>>     at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
> >>>>>>
> >>>>>> Brian
> >>>>
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message