Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 69D8810DC4 for ; Fri, 5 Dec 2014 16:01:53 +0000 (UTC) Received: (qmail 42635 invoked by uid 500); 5 Dec 2014 16:01:50 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 42571 invoked by uid 500); 5 Dec 2014 16:01:50 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Delivered-To: moderator for user@hbase.apache.org Received: (qmail 79606 invoked by uid 99); 5 Dec 2014 09:33:48 -0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of louis.hust.ml@gmail.com designates 209.85.192.169 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=5hnZ7JmPAUI6L8CdCy+r1SgHN8uvoJvjpo+WJ/7cmyo=; b=wZKweLso1tfk74sJi6t8m8K3unlUavxnpx+Z8eeAQVQ6vl1DCDQkDHpPLvxB43Mae/ 148IUrtrCi7nVQjlUlbmOACW2EXiQpOLbyHEjvSMbPtpIeqhTwIC8MYnGgp1BVb8lTyW i9PPCJH9JMsMJllDsWT77RJOBvEFBxbbMxWHWulVTRFZPTFcop3qosfMzPYlyJkWfnv9 K+d5UQKYhRj20E7lYu3dqrKJCkMlnh36dOjivzyZAOWT41Zg8zDXDVmROAC4qyN1z8L4 JLEskYfULX2wS/6DHtlp3ChEVJVZEtOOCbS6HhY+VOIcsbOT0rXyAH140OakGpuvVFK3 Lcyw== X-Received: by 10.70.31.2 with SMTP id w2mr26181993pdh.128.1417771866485; Fri, 05 Dec 2014 01:31:06 -0800 (PST) Content-Type: text/plain; charset=windows-1252 Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Slow waitForAckedSeqno took too long time From: mail list In-Reply-To: Date: Fri, 5 Dec 2014 17:31:00 +0800 Content-Transfer-Encoding: quoted-printable Message-Id: <57D1F8AF-8E04-4A73-8917-DABF36D3EF32@gmail.com> References: To: user@hbase.apache.org X-Mailer: Apple Mail (2.1878.6) X-Virus-Checked: Checked by ClamAV on apache.org I also got the the RegionServer stack on the region server as below: "RS_OPEN_META-l-hbase3:60020-0-WAL.AsyncNotifier" prio=3D10 = tid=3D0x00007f7e7c259000 nid=3D0x3d1 in Object.wait() = [0x00007f7e5eb90000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000d59e4808> (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at = org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncNotifier.run(FSHLog.j= ava:1338) - locked <0x00000000d59e4808> (a java.lang.Object) at java.lang.Thread.run(Thread.java:744) "RS_OPEN_META-l-hbase3:60020-0-WAL.AsyncSyncer4" prio=3D10 = tid=3D0x00007f7e7c257000 nid=3D0x3d0 in Object.wait() = [0x00007f7e5ec91000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000d59e4498> (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at = org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.jav= a:1209) - locked <0x00000000d59e4498> (a java.lang.Object) at java.lang.Thread.run(Thread.java:744) "RS_OPEN_META-l-hbase3:60020-0-WAL.AsyncSyncer3" prio=3D10 = tid=3D0x00007f7e7c255000 nid=3D0x3cf in Object.wait() = [0x00007f7e5ed92000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00000000d5990570> (a java.lang.Object) at java.lang.Object.wait(Object.java:503) at = org.apache.hadoop.hbase.regionserver.wal.FSHLog$AsyncSyncer.run(FSHLog.jav= a:1209) - locked <0x00000000d5990570> (a java.lang.Object) at java.lang.Thread.run(Thread.java:744) On Dec 5, 2014, at 13:01, mail list wrote: > Hi ,all >=20 > I deploy Hbase0.98.6-cdh5.2.0 on 3 machine: >=20 > l-hbase1.dev.dba.cn0(hadoop namenode active, HMaster active) > l-hbase2.dev.dba.cn0(hadoop namenode standby, HMaster standby, hadoop = datanode) > l-hbase3.dev.dba.cn0(regionserver, hadoop datanode) >=20 > Then I shutdown the l-hbase1.dev.dba.cn0, But HBase can not work = until about 15mins later. > I check the log and find the following log in the region server=92s = log: >=20 > 2014-12-05 12:03:19,169 WARN [regionserver60020-WAL.AsyncSyncer0] = hdfs.DFSClient: Slow waitForAckedSeqno took 927762ms (threshold=3D30000ms)= > 2014-12-05 12:03:19,186 INFO [regionserver60020-WAL.AsyncSyncer0] = wal.FSHLog: Slow sync cost: 927779 ms, current pipeline: = [10.86.36.219:50010] > 2014-12-05 12:03:19,186 DEBUG [regionserver60020.logRoller] = regionserver.LogRoller: HLog roll requested > 2014-12-05 12:03:19,236 WARN [regionserver60020-WAL.AsyncSyncer1] = hdfs.DFSClient: Slow waitForAckedSeqno took 867706ms (threshold=3D30000ms)= >=20 > It seems the WAL Asysnc took too long time for region server recovery? = I don=92t know if the log matters ? > Can any body explain the reason? and how to reduce the time for = recovery? >=20 >=20