Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4188F1038D for ; Mon, 2 Sep 2013 14:41:03 +0000 (UTC) Received: (qmail 14008 invoked by uid 500); 2 Sep 2013 14:41:02 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 13950 invoked by uid 500); 2 Sep 2013 14:41:01 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 13801 invoked by uid 99); 2 Sep 2013 14:40:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Sep 2013 14:40:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hv.csuoa@gmail.com designates 209.85.212.169 as permitted sender) Received: from [209.85.212.169] (HELO mail-wi0-f169.google.com) (209.85.212.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Sep 2013 14:40:52 +0000 Received: by mail-wi0-f169.google.com with SMTP id hj3so725118wib.2 for ; Mon, 02 Sep 2013 07:40:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=p1Ba3VtP6RJ/I+Wtu2sjfHJlKv8Occ97BOML5rLZ568=; b=wS7ARIVmOuYq9GuJNZX7nquGxFG3W+aPuPy5ZJVQy6Rx9gvJcZY/2o7OeJ3lJ+8Wim RHgsLxAyPFXNHu0921Hy9crv2fgOHw/M32sYC9tPKOBYcxeLFywIbo0qcdBwLwhpIPRJ YbzDczIRE9jjN8yqobcjCi4hruA5i/NTPBsHbmFaLB2Ym6883f3kI5v1lj/2MgmgpmnS SQ85zuJQ4O5iak0e4gGCyst9WvU3lWunErMfTDec6Kjhe6/zRI6/DCOhUp0C3KPZPk/J VaK1qEBExhXzE02IVvmZKfhCDQXjioZkrA3Ls2KySSOT3No/lzPGf6AF114MnBFErWau MGCg== MIME-Version: 1.0 X-Received: by 10.180.37.227 with SMTP id b3mr6378350wik.24.1378132831997; Mon, 02 Sep 2013 07:40:31 -0700 (PDT) Received: by 10.217.126.4 with HTTP; Mon, 2 Sep 2013 07:40:31 -0700 (PDT) In-Reply-To: References: Date: Mon, 2 Sep 2013 07:40:31 -0700 Message-ID: Subject: Re: Region server blocked at waitForAckedSeqno From: Himanshu Vashishtha To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=e89a8f646ff9e7933c04e5678df9 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f646ff9e7933c04e5678df9 Content-Type: text/plain; charset=ISO-8859-1 Hey Mickey, I have few followup questions: For how long these threads blocked? What happens afterwards, regionserver resumes, or aborts? And, could you pastebin the logs after the above exception? Sync failure causes a log roll, which is retried based on value of hbase.regionserver.logroll.errors.tolerated Which 0.94 version you are using? Thanks, Himanshu On Mon, Sep 2, 2013 at 5:16 AM, Mickey wrote: > Hi, all > > I was testing HBase with HDFS QJM HA recently. Hadoop version is CDH 4.3.0 > and HBase is based on 0.94 with some patches(include HBASE-8211) > In a test, I met a blocking issue in HBase. I killed a node which is the > active namenode, also datanode, regionserver on it. > > The HDFS fail over successfully. The master tried re-assign the regions > after detecting the regionserver down. But no region can be online. > > From the log I found all operations to .META. failed. Printing the jstack > of the region server who contains the .META. , I found info below: > "regionserver60020.logSyncer" daemon prio=10 tid=0x00007f317007e800 > nid=0x27ee5 in Object.wait() [0x00007f318add9000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > at > > org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1708) > - locked <0x00007f34ae7b3638> (a java.util.LinkedList) > at > > org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1609) > at > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1525) > at > org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1510) > at > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116) > at > org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:1208) > at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:303) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1290) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1247) > at > org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1400) > at > org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1199) > at java.lang.Thread.run(Thread.java:662) > > The logSyncer is always waiting on waitForAckedSeqno. All the HLog > operations seems blocked. Is this a bug? Or I missed some important > patches? > > Hope to get your suggestions soon. > > Best regards, > Mickey > --e89a8f646ff9e7933c04e5678df9--