Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of sunww@asiainfo-linkage.com
 designates 125.35.16.198 as permitted sender)
From: "sunweiwei" <sunww@asiainfo-linkage.com>
To: <user@hbase.apache.org>
References: <001d01cf7eff$959a4770$c0ced650$@com>
	<5A9A57920BD5514FA7E38922FBC875D95220FF9C@ex-mbox3>
	<002701cf7f0e$f41cc6b0$dc565410$@com>
	<CADcMMgE1ixP5Hoea1MdcLG=p6C9E51yO3-FKkXJwSY6ahgqvEw@mail.gmail.com>
	<000f01cf8066$3bbdfcc0$b339f640$@com>
	<CADcMMgF4799vROrurbZYQgAc=g0sBptUgEz+52AkEak3EfgPfQ@mail.gmail.com>
 <CAAT7Mkr-16QEw-ckpGe-nP=vVjrROq3Jco6OmVLTwcBQk5r4Tg@mail.gmail.com>
 <002101cf808a$7e0ee020$7a2ca060$@com>
In-Reply-To: <002101cf808a$7e0ee020$7a2ca060$@com>
Subject: 
 =?utf-8?B?562U5aSNOiDnrZTlpI06IOetlOWkjTogZm9yY2luZyBmbHVzaCBub3Qgd28=?=
	=?utf-8?B?cmtz?=
Date: Sat, 7 Jun 2014 09:33:11 +0800
Message-ID: <004f01cf81f0$71d0b0b0$55721210$@com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Thread-Index: Ac+AgRZI6jYe7Nw3R8ydGYKmrjbRbQAB3r6AAFn1LkA=
Content-Language: zh-cn

Hi

The symptom reproduced again.
I paste the  log  in =
http://paste2.org/D2N6ZDvk,http://paste2.org/a64LXD0X
One is the regionserver jstack log.
The other is regionserver log, which was grep and only include the =
unflush region.

Thanks

-----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6-----
=E5=8F=91=E4=BB=B6=E4=BA=BA: sunweiwei =
[mailto:sunww@asiainfo-linkage.com]=20
=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B46=E6=9C=885=E6=97=A5 =
14:51
=E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org
=E4=B8=BB=E9=A2=98: =E7=AD=94=E5=A4=8D: =E7=AD=94=E5=A4=8D: =
=E7=AD=94=E5=A4=8D: forcing flush not works

I'm sorry but the regionserver log have been deleted...

to Stack:
Yes, Always the same two regions of Table BT_D_BF001_201406 always can't =
be flushed.


Previously  I  have only saved a little log, When Table =
BT_D_BF001_201405  had lots of regions.
2014-05-27 22:40:52,025 DEBUG [regionserver60020.logRoller] =
regionserver.LogRoller: HLog roll requested
2014-05-27 22:40:52,039 DEBUG [regionserver60020.logRoller] wal.FSHLog: =
cleanupCurrentWriter  waiting for transactions to get synced  total =
450500823 synced till here 450500779
2014-05-27 22:40:52,049 INFO  [regionserver60020.logRoller] wal.FSHLog: =
Rolled WAL =
/apps/hbase/data/WALs/hadoop03,60020,1401173211108/hadoop03%2C60020%2C140=
1173211108.1401201646659 with entries=3D94581, filesize=3D122.2 M; new =
WAL =
/apps/hbase/data/WALs/hadoop03,60020,1401173211108/hadoop03%2C60020%2C140=
1173211108.1401201652025
2014-05-27 22:40:52,049 INFO  [regionserver60020.logRoller] wal.FSHLog: =
Too many hlogs: logs=3D156, maxlogs=3D32; forcing flush of 2 regions(s): =
a5b94272f0fdd477bf320e428059fe87, f1a60d3ea5820cb672832c59531de89d
2014-05-27 22:40:52,073 DEBUG [Thread-17] regionserver.MemStoreFlusher: =
Flush thread woke up because memory above low water=3D6.1 G
2014-05-27 22:40:52,074 DEBUG [Thread-17] regionserver.MemStoreFlusher: =
Under global heap pressure: Region =
BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050=
8151602201405081516021740000000004620000010940000800000000000000000005480=
90000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. has too many =
store files, but is 27.6 M vs best flushable region's 0. Choosing the =
bigger.
2014-05-27 22:40:52,075 INFO  [Thread-17] regionserver.MemStoreFlusher: =
Flush of region =
BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050=
8151602201405081516021740000000004620000010940000800000000000000000005480=
90000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. due to global =
heap pressure
2014-05-27 22:40:52,075 DEBUG [Thread-17] regionserver.HRegion: Started =
memstore flush for =
BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050=
8151602201405081516021740000000004620000010940000800000000000000000005480=
90000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39., current region =
memstore size 27.6 M
2014-05-27 22:40:52,599 INFO  [Thread-17] =
regionserver.DefaultStoreFlusher: Flushed, sequenceid=3D10069900941, =
memsize=3D27.6 M, hasBloomFilter=3Dtrue, into tmp file =
hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80=
bd6fede708c05c9fcc9e2b39/.tmp/a89428808e1a4be4a1bf7bd9ec8ece88
2014-05-27 22:40:52,608 DEBUG [Thread-17] =
regionserver.HRegionFileSystem: Committing store file =
hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80=
bd6fede708c05c9fcc9e2b39/.tmp/a89428808e1a4be4a1bf7bd9ec8ece88 as =
hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80=
bd6fede708c05c9fcc9e2b39/cf/a89428808e1a4be4a1bf7bd9ec8ece88
2014-05-27 22:40:52,617 INFO  [Thread-17] regionserver.HStore: Added =
hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80=
bd6fede708c05c9fcc9e2b39/cf/a89428808e1a4be4a1bf7bd9ec8ece88, =
entries=3D44962, sequenceid=3D10069900941, filesize=3D5.5 M
2014-05-27 22:40:52,618 INFO  [Thread-17] regionserver.HRegion: Finished =
memstore flush of ~27.6 M/28933240, currentsize=3D43.6 K/44664 for =
region =
BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050=
8151602201405081516021740000000004620000010940000800000000000000000005480=
90000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. in 542ms, =
sequenceid=3D10069900941, compaction requested=3Dtrue
2014-05-27 22:40:52,618 DEBUG [Thread-17] =
regionserver.CompactSplitThread: Small Compaction requested: system; =
Because: Thread-17; compaction_queue=3D(4896:19152), split_queue=3D0, =
merge_queue=3D0


Thanks

-----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6-----
=E5=8F=91=E4=BB=B6=E4=BA=BA: ramkrishna vasudevan =
[mailto:ramkrishna.s.vasudevan@gmail.com]=20
=E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B46=E6=9C=885=E6=97=A5 =
13:43
=E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org
=E4=B8=BB=E9=A2=98: Re: =E7=AD=94=E5=A4=8D: =E7=AD=94=E5=A4=8D: forcing =
flush not works

>>I still (highly)suspect that there is something wrong with the flush
queue(some entry pushed into it can't be poll out).
Ya I too have that suspect.  May be any new logs may help to uncover the
issue.


On Thu, Jun 5, 2014 at 11:06 AM, Stack <stack@duboce.net> wrote:

> Always the same two regions that get stuck or does it vary?  Another =
set of
> example logs may help uncover the sequence of trouble-causing events.
>
> Thanks,
> St.Ack
>
>
> On Wed, Jun 4, 2014 at 7:31 PM, sunweiwei <sunww@asiainfo-linkage.com>
> wrote:
>
> > my log is similar as HBASE-10499.
> >
> > Thanks
> >
> > -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6-----
> > =E5=8F=91=E4=BB=B6=E4=BA=BA: saint.ack@gmail.com =
[mailto:saint.ack@gmail.com] =E4=BB=A3=E8=A1=A8 Stack
> > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: =
2014=E5=B9=B46=E6=9C=883=E6=97=A5 23:10
> > =E6=94=B6=E4=BB=B6=E4=BA=BA: Hbase-User
> > =E4=B8=BB=E9=A2=98: Re: =E7=AD=94=E5=A4=8D: forcing flush not works
> >
> > Mind posting link to your log?  Sounds like HBASE-10499 as Honghua =
says.
> > St.Ack
> >
> >
> > On Tue, Jun 3, 2014 at 2:34 AM, sunweiwei =
<sunww@asiainfo-linkage.com>
> > wrote:
> >
> > > Thanks. Maybe the same as HBase-10499.
> > > I stop the regionserver then start it. Then hbase back to normal.
> > > This is jstack log when  2 regions  can not  flush.
> > >
> > > "Thread-17" prio=3D10 tid=3D0x00007f6210383800 nid=3D0x6540 =
waiting on
> > condition
> > > [0x00007f61e0a26000]
> > >    java.lang.Thread.State: TIMED_WAITING (parking)
> > >         at sun.misc.Unsafe.park(Native Method)
> > >         - parking to wait for  <0x000000041ae0e6b8> (a
> > > java.util.concurrent.
> > > locks.AbstractQueuedSynchronizer$ConditionObject)
> > >         at
> > > =
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
> > >         at
> > >
> > >
> >
> =
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa=
itN
> > > anos(AbstractQueuedSynchronizer.java:2025)
> > >         at =
java.util.concurrent.DelayQueue.poll(DelayQueue.java:201)
> > >         at =
java.util.concurrent.DelayQueue.poll(DelayQueue.java:39)
> > >         at
> > >
> > >
> >
> =
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(Mem=
Sto
> > > reFlusher.java:228)
> > >         at java.lang.Thread.run(Thread.java:662)
> > >
> > > -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6-----
> > > =E5=8F=91=E4=BB=B6=E4=BA=BA: =E5=86=AF=E5=AE=8F=E5=8D=8E =
[mailto:fenghonghua@xiaomi.com]
> > > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: =
2014=E5=B9=B46=E6=9C=883=E6=97=A5 16:34
> > > =E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org
> > > =E4=B8=BB=E9=A2=98: =E7=AD=94=E5=A4=8D: forcing flush not works
> > >
> > > The same symptom as HBase-10499?
> > >
> > > I still (highly)suspect that there is something wrong with the =
flush
> > > queue(some entry pushed into it can't be poll out).
> > > ________________________________________
> > > =E5=8F=91=E4=BB=B6=E4=BA=BA: sunweiwei =
[sunww@asiainfo-linkage.com]
> > > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: =
2014=E5=B9=B46=E6=9C=883=E6=97=A5 15:43
> > > =E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org
> > > =E4=B8=BB=E9=A2=98: forcing flush not works
> > >
> > > Hi
> > >
> > >
> > >
> > > I'm using a heavy-write hbase0.96 . I find this in regionserver =
log:
> > >
> > > 2014-06-03 15:13:19,445 INFO  [regionserver60020.logRoller] =
wal.FSHLog:
> > Too
> > > many hlogs: logs=3D33, maxlogs=3D32; forcing flush of 3 =
regions(s):
> > > 1a7dda3c3815c19970ace39fd99abfe8, =
aff81bc46aa7d3ed51a01f11f23c8320,
> > > d5666e003f598147b4dda509f173a779
> > >
> > > 2014-06-03 15:13:23,869 INFO  [regionserver60020.logRoller] =
wal.FSHLog:
> > Too
> > > many hlogs: logs=3D34, maxlogs=3D32; forcing flush of 2 =
regions(s):
> > > aff81bc46aa7d3ed51a01f11f23c8320, d5666e003f598147b4dda509f173a779
> > >
> > > =E2=94=87
> > >
> > > =E2=94=87
> > >
> > > 2014-06-03 15:18:14,778 INFO  [regionserver60020.logRoller] =
wal.FSHLog:
> > Too
> > > many hlogs: logs=3D93, maxlogs=3D32; forcing flush of 2 =
regions(s):
> > > aff81bc46aa7d3ed51a01f11f23c8320, d5666e003f598147b4dda509f173a779
> > >
> > >
> > >
> > >
> > >
> > > It seems like 2 regions can=E2=80=99t be flushed and WALs Dir =
continue to
> > increase
> > > and Then I find this in client log:
> > >
> > > INFO | AsyncProcess-waitForMaximumCurrentTasks [2014-06-03 =
15:30:53] -
> :
> > > Waiting for the global number of running tasks to be equals or =
less
> than
> > 0,
> > > tasksSent=3D15819, tasksDone=3D15818, currentTasksDone=3D15818,
> > > tableName=3DBT_D_BF001_201406
> > >
> > >
> > >
> > > Then write speed will become very slow.
> > >
> > > After I flush 2 regions  manually , write speed can back to normal
>  only
> > > a
> > > short while.
> > >
> > >
> > >
> > > Any suggestion will be appreciated. Thanks.
> > >
> > >
> > >
> >
> >
>