Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1588E11073 for ; Sat, 7 Jun 2014 01:33:45 +0000 (UTC) Received: (qmail 67086 invoked by uid 500); 7 Jun 2014 01:33:38 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 67013 invoked by uid 500); 7 Jun 2014 01:33:38 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 67003 invoked by uid 99); 7 Jun 2014 01:33:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Jun 2014 01:33:38 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sunww@asiainfo-linkage.com designates 125.35.16.198 as permitted sender) Received: from [125.35.16.198] (HELO asiainfo-linkage.com) (125.35.16.198) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 07 Jun 2014 01:33:34 +0000 Received: from sunPC([122.96.108.89]) by asiainfo-linkage.com(AIMCPRO 1.0.0.0) with SMTP id jmf5392da5a; Sat, 07 Jun 2014 09:33:09 +0800 From: "sunweiwei" To: References: <001d01cf7eff$959a4770$c0ced650$@com> <5A9A57920BD5514FA7E38922FBC875D95220FF9C@ex-mbox3> <002701cf7f0e$f41cc6b0$dc565410$@com> <000f01cf8066$3bbdfcc0$b339f640$@com> <002101cf808a$7e0ee020$7a2ca060$@com> In-Reply-To: <002101cf808a$7e0ee020$7a2ca060$@com> Subject: =?utf-8?B?562U5aSNOiDnrZTlpI06IOetlOWkjTogZm9yY2luZyBmbHVzaCBub3Qgd28=?= =?utf-8?B?cmtz?= Date: Sat, 7 Jun 2014 09:33:11 +0800 Message-ID: <004f01cf81f0$71d0b0b0$55721210$@com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 12.0 Thread-Index: Ac+AgRZI6jYe7Nw3R8ydGYKmrjbRbQAB3r6AAFn1LkA= Content-Language: zh-cn x-cr-hashedpuzzle: AZ32 B9xx CkRC Cm5c C71u DZVF D1cd EUx0 GRP/ Go+4 IU75 IpY7 J9id J+3Q KKma LHKy;1;dQBzAGUAcgBAAGgAYgBhAHMAZQAuAGEAcABhAGMAaABlAC4AbwByAGcA;Sosha1_v1;7;{A3B3C420-1177-4C3B-A597-23D145AF3E4D};cwB1AG4AdwB3AEAAYQBzAGkAYQBpAG4AZgBvAC0AbABpAG4AawBhAGcAZQAuAGMAbwBtAA==;Sat, 07 Jun 2014 01:33:09 GMT;VHsNWToAIABUew1ZOgAgAFR7DVk6ACAAZgBvAHIAYwBpAG4AZwAgAGYAbAB1AHMAaAAgAG4AbwB0ACAAdwBvAHIAawBzAA== x-cr-puzzleid: {A3B3C420-1177-4C3B-A597-23D145AF3E4D} X-AIMC-AUTH: sunww X-AIMC-MAILFROM: sunww@asiainfo-linkage.com X-AIMC-Msg-ID: VzX5Ft6B X-Virus-Checked: Checked by ClamAV on apache.org Hi The symptom reproduced again. I paste the log in = http://paste2.org/D2N6ZDvk,http://paste2.org/a64LXD0X One is the regionserver jstack log. The other is regionserver log, which was grep and only include the = unflush region. Thanks -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- =E5=8F=91=E4=BB=B6=E4=BA=BA: sunweiwei = [mailto:sunww@asiainfo-linkage.com]=20 =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B46=E6=9C=885=E6=97=A5 = 14:51 =E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org =E4=B8=BB=E9=A2=98: =E7=AD=94=E5=A4=8D: =E7=AD=94=E5=A4=8D: = =E7=AD=94=E5=A4=8D: forcing flush not works I'm sorry but the regionserver log have been deleted... to Stack: Yes, Always the same two regions of Table BT_D_BF001_201406 always can't = be flushed. Previously I have only saved a little log, When Table = BT_D_BF001_201405 had lots of regions. 2014-05-27 22:40:52,025 DEBUG [regionserver60020.logRoller] = regionserver.LogRoller: HLog roll requested 2014-05-27 22:40:52,039 DEBUG [regionserver60020.logRoller] wal.FSHLog: = cleanupCurrentWriter waiting for transactions to get synced total = 450500823 synced till here 450500779 2014-05-27 22:40:52,049 INFO [regionserver60020.logRoller] wal.FSHLog: = Rolled WAL = /apps/hbase/data/WALs/hadoop03,60020,1401173211108/hadoop03%2C60020%2C140= 1173211108.1401201646659 with entries=3D94581, filesize=3D122.2 M; new = WAL = /apps/hbase/data/WALs/hadoop03,60020,1401173211108/hadoop03%2C60020%2C140= 1173211108.1401201652025 2014-05-27 22:40:52,049 INFO [regionserver60020.logRoller] wal.FSHLog: = Too many hlogs: logs=3D156, maxlogs=3D32; forcing flush of 2 regions(s): = a5b94272f0fdd477bf320e428059fe87, f1a60d3ea5820cb672832c59531de89d 2014-05-27 22:40:52,073 DEBUG [Thread-17] regionserver.MemStoreFlusher: = Flush thread woke up because memory above low water=3D6.1 G 2014-05-27 22:40:52,074 DEBUG [Thread-17] regionserver.MemStoreFlusher: = Under global heap pressure: Region = BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050= 8151602201405081516021740000000004620000010940000800000000000000000005480= 90000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. has too many = store files, but is 27.6 M vs best flushable region's 0. Choosing the = bigger. 2014-05-27 22:40:52,075 INFO [Thread-17] regionserver.MemStoreFlusher: = Flush of region = BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050= 8151602201405081516021740000000004620000010940000800000000000000000005480= 90000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. due to global = heap pressure 2014-05-27 22:40:52,075 DEBUG [Thread-17] regionserver.HRegion: Started = memstore flush for = BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050= 8151602201405081516021740000000004620000010940000800000000000000000005480= 90000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39., current region = memstore size 27.6 M 2014-05-27 22:40:52,599 INFO [Thread-17] = regionserver.DefaultStoreFlusher: Flushed, sequenceid=3D10069900941, = memsize=3D27.6 M, hasBloomFilter=3Dtrue, into tmp file = hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80= bd6fede708c05c9fcc9e2b39/.tmp/a89428808e1a4be4a1bf7bd9ec8ece88 2014-05-27 22:40:52,608 DEBUG [Thread-17] = regionserver.HRegionFileSystem: Committing store file = hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80= bd6fede708c05c9fcc9e2b39/.tmp/a89428808e1a4be4a1bf7bd9ec8ece88 as = hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80= bd6fede708c05c9fcc9e2b39/cf/a89428808e1a4be4a1bf7bd9ec8ece88 2014-05-27 22:40:52,617 INFO [Thread-17] regionserver.HStore: Added = hdfs://hdpcluster/apps/hbase/data/data/default/BT_D_BF001_201405/47633a80= bd6fede708c05c9fcc9e2b39/cf/a89428808e1a4be4a1bf7bd9ec8ece88, = entries=3D44962, sequenceid=3D10069900941, filesize=3D5.5 M 2014-05-27 22:40:52,618 INFO [Thread-17] regionserver.HRegion: Finished = memstore flush of ~27.6 M/28933240, currentsize=3D43.6 K/44664 for = region = BT_D_BF001_201405,8618989870918460036571102456550000000012320000002014050= 8151602201405081516021740000000004620000010940000800000000000000000005480= 90000000,1401009042160.47633a80bd6fede708c05c9fcc9e2b39. in 542ms, = sequenceid=3D10069900941, compaction requested=3Dtrue 2014-05-27 22:40:52,618 DEBUG [Thread-17] = regionserver.CompactSplitThread: Small Compaction requested: system; = Because: Thread-17; compaction_queue=3D(4896:19152), split_queue=3D0, = merge_queue=3D0 Thanks -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- =E5=8F=91=E4=BB=B6=E4=BA=BA: ramkrishna vasudevan = [mailto:ramkrishna.s.vasudevan@gmail.com]=20 =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2014=E5=B9=B46=E6=9C=885=E6=97=A5 = 13:43 =E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org =E4=B8=BB=E9=A2=98: Re: =E7=AD=94=E5=A4=8D: =E7=AD=94=E5=A4=8D: forcing = flush not works >>I still (highly)suspect that there is something wrong with the flush queue(some entry pushed into it can't be poll out). Ya I too have that suspect. May be any new logs may help to uncover the issue. On Thu, Jun 5, 2014 at 11:06 AM, Stack wrote: > Always the same two regions that get stuck or does it vary? Another = set of > example logs may help uncover the sequence of trouble-causing events. > > Thanks, > St.Ack > > > On Wed, Jun 4, 2014 at 7:31 PM, sunweiwei > wrote: > > > my log is similar as HBASE-10499. > > > > Thanks > > > > -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- > > =E5=8F=91=E4=BB=B6=E4=BA=BA: saint.ack@gmail.com = [mailto:saint.ack@gmail.com] =E4=BB=A3=E8=A1=A8 Stack > > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: = 2014=E5=B9=B46=E6=9C=883=E6=97=A5 23:10 > > =E6=94=B6=E4=BB=B6=E4=BA=BA: Hbase-User > > =E4=B8=BB=E9=A2=98: Re: =E7=AD=94=E5=A4=8D: forcing flush not works > > > > Mind posting link to your log? Sounds like HBASE-10499 as Honghua = says. > > St.Ack > > > > > > On Tue, Jun 3, 2014 at 2:34 AM, sunweiwei = > > wrote: > > > > > Thanks. Maybe the same as HBase-10499. > > > I stop the regionserver then start it. Then hbase back to normal. > > > This is jstack log when 2 regions can not flush. > > > > > > "Thread-17" prio=3D10 tid=3D0x00007f6210383800 nid=3D0x6540 = waiting on > > condition > > > [0x00007f61e0a26000] > > > java.lang.Thread.State: TIMED_WAITING (parking) > > > at sun.misc.Unsafe.park(Native Method) > > > - parking to wait for <0x000000041ae0e6b8> (a > > > java.util.concurrent. > > > locks.AbstractQueuedSynchronizer$ConditionObject) > > > at > > > = java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196) > > > at > > > > > > > > > = java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awa= itN > > > anos(AbstractQueuedSynchronizer.java:2025) > > > at = java.util.concurrent.DelayQueue.poll(DelayQueue.java:201) > > > at = java.util.concurrent.DelayQueue.poll(DelayQueue.java:39) > > > at > > > > > > > > > = org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(Mem= Sto > > > reFlusher.java:228) > > > at java.lang.Thread.run(Thread.java:662) > > > > > > -----=E9=82=AE=E4=BB=B6=E5=8E=9F=E4=BB=B6----- > > > =E5=8F=91=E4=BB=B6=E4=BA=BA: =E5=86=AF=E5=AE=8F=E5=8D=8E = [mailto:fenghonghua@xiaomi.com] > > > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: = 2014=E5=B9=B46=E6=9C=883=E6=97=A5 16:34 > > > =E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org > > > =E4=B8=BB=E9=A2=98: =E7=AD=94=E5=A4=8D: forcing flush not works > > > > > > The same symptom as HBase-10499? > > > > > > I still (highly)suspect that there is something wrong with the = flush > > > queue(some entry pushed into it can't be poll out). > > > ________________________________________ > > > =E5=8F=91=E4=BB=B6=E4=BA=BA: sunweiwei = [sunww@asiainfo-linkage.com] > > > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: = 2014=E5=B9=B46=E6=9C=883=E6=97=A5 15:43 > > > =E6=94=B6=E4=BB=B6=E4=BA=BA: user@hbase.apache.org > > > =E4=B8=BB=E9=A2=98: forcing flush not works > > > > > > Hi > > > > > > > > > > > > I'm using a heavy-write hbase0.96 . I find this in regionserver = log: > > > > > > 2014-06-03 15:13:19,445 INFO [regionserver60020.logRoller] = wal.FSHLog: > > Too > > > many hlogs: logs=3D33, maxlogs=3D32; forcing flush of 3 = regions(s): > > > 1a7dda3c3815c19970ace39fd99abfe8, = aff81bc46aa7d3ed51a01f11f23c8320, > > > d5666e003f598147b4dda509f173a779 > > > > > > 2014-06-03 15:13:23,869 INFO [regionserver60020.logRoller] = wal.FSHLog: > > Too > > > many hlogs: logs=3D34, maxlogs=3D32; forcing flush of 2 = regions(s): > > > aff81bc46aa7d3ed51a01f11f23c8320, d5666e003f598147b4dda509f173a779 > > > > > > =E2=94=87 > > > > > > =E2=94=87 > > > > > > 2014-06-03 15:18:14,778 INFO [regionserver60020.logRoller] = wal.FSHLog: > > Too > > > many hlogs: logs=3D93, maxlogs=3D32; forcing flush of 2 = regions(s): > > > aff81bc46aa7d3ed51a01f11f23c8320, d5666e003f598147b4dda509f173a779 > > > > > > > > > > > > > > > > > > It seems like 2 regions can=E2=80=99t be flushed and WALs Dir = continue to > > increase > > > and Then I find this in client log: > > > > > > INFO | AsyncProcess-waitForMaximumCurrentTasks [2014-06-03 = 15:30:53] - > : > > > Waiting for the global number of running tasks to be equals or = less > than > > 0, > > > tasksSent=3D15819, tasksDone=3D15818, currentTasksDone=3D15818, > > > tableName=3DBT_D_BF001_201406 > > > > > > > > > > > > Then write speed will become very slow. > > > > > > After I flush 2 regions manually , write speed can back to normal > only > > > a > > > short while. > > > > > > > > > > > > Any suggestion will be appreciated. Thanks. > > > > > > > > > > > > > >