Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of tianq01@gmail.com designates
 209.85.216.182 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMZUsP6ixyVJ-CMo70fVWAOfCXPV+ufisr3Z_n_L6sez+RVpUw@mail.gmail.com>
References: <D62B10F9-D00A-483D-9295-01B7E8FF6C5C@digitalenvoy.net>
	<CALte62ycQefaoLOeQwoo+PSuDLe5pjUKNJK-uYESY_350D+JwA@mail.gmail.com>
	<AFE2EA85-96A9-44E6-80A0-662A910546B7@digitalenvoy.net>
	<CCBCE5CF-5FCD-4406-ADB5-4A69D177695B@gmail.com>
	<331EC3BB-2761-4D8B-AC69-555B2E9F9969@digitalenvoy.net>
	<CALte62z4ahwDVbtpsu1fH_KnUF1r3jpqrSHcmSt6NV0FuXcGxQ@mail.gmail.com>
	<4F44EA7A-C697-46EB-A5CF-3EE77F9F4995@digitalenvoy.net>
	<CAMZUsP6ixyVJ-CMo70fVWAOfCXPV+ufisr3Z_n_L6sez+RVpUw@mail.gmail.com>
Date: Wed, 12 Nov 2014 10:35:51 +0800
Message-ID: 
 <CAMZUsP4B=Bef9vcvxLxqwLU=Bsh6tPtnx0oyDGheJAS9VPBr6Q@mail.gmail.com>
Subject: Re: what can cause RegionTooBusyException?
From: Qiang Tian <tianq01@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a11c2d75813a1190507a0412a

--001a11c2d75813a1190507a0412a
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

or:
          LOG.warn("Region " + region.getRegionNameAsString() + " has too
many " +
            "store files; delaying flush up to " + this.blockingWaitTime +
"ms");

sth like:

WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
occurrence,\x17\xF1o\x9C,1340981109494.ecb85155563c6614e5448c7d700b909e.
has too many store files; delaying flush up to 90000ms


On Wed, Nov 12, 2014 at 10:26 AM, Qiang Tian <tianq01@gmail.com> wrote:

> the checkResource Ted mentioned is a good suspect. see online hbase book
> "9.7.7.7.1.1. Being Stuck".
> Did you see below message in your RS log?
>         LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime=
)
> +
>           "ms on a compaction to clean up 'too many store files'; waited =
"
> +
>           "long enough... proceeding with flush of " +
>           region.getRegionNameAsString());
>
>
> I did a quick test setting "hbase.hregion.memstore.block.multiplier" =3D =
0,
> issuing a put in hbase shell will trigger flush and throw region too busy
> exception to client,  and the retry mechanism will make it done in next
> multi RPC call.
>
>
>
> On Wed, Nov 12, 2014 at 1:21 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
>
>> Thanks. I appear to have resolved this problem by restarting the HBase
>> Master and the RegionServers
>> that were reporting the failure.
>>
>> Brian
>>
>> On Nov 11, 2014, at 12:13 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>> > For your first question, region server web UI,
>> > rs-status#regionRequestStats, shows Write Request Count.
>> >
>> > You can monitor the value for the underlying region to see if it
>> receives
>> > above-normal writes.
>> >
>> > Cheers
>> >
>> > On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema <bdjeltema@gmail.com>
>> wrote:
>> >
>> >>> Was the region containing this row hot around the time of failure ?
>> >>
>> >> How do I measure that?
>> >>
>> >>>
>> >>> Can you check region server log (along with monitoring tool) what
>> >> memstore pressure was ?
>> >>
>> >> I didn't see anything in the region server logs to indicate a problem=
.
>> And
>> >> given the
>> >> reproducibility of the behavior, it's hard to see how dynamic
>> parameters
>> >> such as
>> >> memory pressure could be at the root of the problem.
>> >>
>> >> Brian
>> >>
>> >> On Nov 10, 2014, at 3:22 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> >>
>> >>> Was the region containing this row hot around the time of failure ?
>> >>>
>> >>> Can you check region server log (along with monitoring tool) what
>> >> memstore pressure was ?
>> >>>
>> >>> Thanks
>> >>>
>> >>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
>> >> brian.jeltema@digitalenvoy.net> wrote:
>> >>>
>> >>>>> How many tasks may write to this row concurrently ?
>> >>>>
>> >>>> only 1 mapper should be writing to this row. Is there a way to chec=
k
>> >> which
>> >>>> locks are being held?
>> >>>>
>> >>>>> Which 0.98 release are you using ?
>> >>>>
>> >>>> 0.98.0.2.1.2.1-471-hadoop2
>> >>>>
>> >>>> Thanks
>> >>>> Brian
>> >>>>
>> >>>> On Nov 10, 2014, at 2:21 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> >>>>
>> >>>>> There could be more than one reason where RegionTooBusyException i=
s
>> >> thrown.
>> >>>>> Below are two (from HRegion):
>> >>>>>
>> >>>>> * We throw RegionTooBusyException if above memstore limit
>> >>>>> * and expect client to retry using some kind of backoff
>> >>>>> */
>> >>>>> private void checkResources()
>> >>>>>
>> >>>>> * Try to acquire a lock.  Throw RegionTooBusyException
>> >>>>>
>> >>>>> * if failed to get the lock in time. Throw InterruptedIOException
>> >>>>>
>> >>>>> * if interrupted while waiting for the lock.
>> >>>>>
>> >>>>> */
>> >>>>>
>> >>>>> private void lock(final Lock lock, final int multiplier)
>> >>>>>
>> >>>>> How many tasks may write to this row concurrently ?
>> >>>>>
>> >>>>> Which 0.98 release are you using ?
>> >>>>>
>> >>>>> Cheers
>> >>>>>
>> >>>>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
>> >>>>> brian.jeltema@digitalenvoy.net> wrote:
>> >>>>>
>> >>>>>> I=E2=80=99m running a map/reduce job against a table that is perf=
orming a
>> >> large
>> >>>>>> number of writes (probably updating every row).
>> >>>>>> The job is failing with the exception below. This is a solid
>> failure;
>> >> it
>> >>>>>> dies at the same point in the application,
>> >>>>>> and at the same row in the table. So I doubt it=E2=80=99s a confl=
ict with
>> >>>>>> compaction (and the UI shows no compaction in progress),
>> >>>>>> or that there is a load-related cause.
>> >>>>>>
>> >>>>>> =E2=80=98hbase hbck=E2=80=99 does not report any inconsistencies.=
 The
>> >>>>>> =E2=80=98waitForAllPreviousOpsAndReset=E2=80=99 leads me to suspe=
ct that
>> >>>>>> there is operation in progress that is hung and blocking the
>> update. I
>> >>>>>> don=E2=80=99t see anything suspicious in the HBase logs.
>> >>>>>> The data at the point of failure is not unusual, and is identical
>> to
>> >> many
>> >>>>>> preceding rows.
>> >>>>>> Does anybody have any ideas of what I should look for to find the
>> >> cause of
>> >>>>>> this RegionTooBusyException?
>> >>>>>>
>> >>>>>> This is Hadoop 2.4 and HBase 0.98.
>> >>>>>>
>> >>>>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>> >>>>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
>> >>>>>> Error:
>> >>>>>>
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >> Failed
>> >>>>>> 1744 actions: RegionTooBusyException: 1744 times,
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(As=
yncProcess.java:207)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(Asyn=
cProcess.java:187)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndRese=
t(AsyncProcess.java:1568)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java=
:1023)
>> >>>>>>     at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:99=
5)
>> >>>>>>     at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>> >>>>>>
>> >>>>>> Brian
>> >>>>
>> >>>
>> >>
>> >>
>>
>>
>

--001a11c2d75813a1190507a0412a--