hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mickey <huanfeng...@gmail.com>
Subject Re: Region server blocked at waitForAckedSeqno
Date Tue, 03 Sep 2013 03:30:53 GMT
Hi Himanshu,
It lasted for more than one hour. At last I tried to stop the region server
in and failed. From the jstack it was still blocked by

the HLog syncer. So I kill the process with "kill -9" and then the HBase
got well.

hbase.regionserver.logroll.errors.tolerated is the default value 0.

My HBase cluster is mainly based on 0.94.1.

Attachment is the region server which contains the .META. and the jstack
when it is blocked.

Thanks,
Mickey



2013/9/2 Himanshu Vashishtha <hv.csuoa@gmail.com>

> Hey Mickey,
>
> I have few followup questions:
>
> For how long these threads blocked? What happens afterwards, regionserver
> resumes, or aborts?
> And, could you pastebin the logs after the above exception?
> Sync failure causes a log roll, which is retried based on value of
> hbase.regionserver.logroll.errors.tolerated
> Which 0.94 version you are using?
>
> Thanks,
> Himanshu
>
>
>
> On Mon, Sep 2, 2013 at 5:16 AM, Mickey <huanfeng388@gmail.com> wrote:
>
> > Hi, all
> >
> > I was testing HBase with HDFS QJM HA recently. Hadoop version is CDH
> 4.3.0
> > and HBase is based on 0.94 with some patches(include HBASE-8211)
> > In a test, I met a blocking issue in HBase.  I killed a node which is the
> > active namenode, also datanode, regionserver on it.
> >
> > The HDFS fail over successfully. The master tried re-assign the regions
> > after detecting the regionserver down. But no region can be online.
> >
> > From the log I found all operations to .META. failed. Printing the jstack
> > of the region server who contains the .META. , I found info below:
> > "regionserver60020.logSyncer" daemon prio=10 tid=0x00007f317007e800
> > nid=0x27ee5 in Object.wait() [0x00007f318add9000]
> >    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> >         at java.lang.Object.wait(Native Method)
> >         at
> >
> >
> org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:1708)
> >         - locked <0x00007f34ae7b3638> (a java.util.LinkedList)
> >         at
> >
> >
> org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1609)
> >         at
> > org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1525)
> >         at
> > org.apache.hadoop.hdfs.DFSOutputStream.sync(DFSOutputStream.java:1510)
> >         at
> > org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:116)
> >         at
> > org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:1208)
> >         at sun.reflect.GeneratedMethodAccessor26.invoke(Unknown Source)
> >         at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> >
> >
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:303)
> >         at
> > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1290)
> >         at
> > org.apache.hadoop.hbase.regionserver.wal.HLog.syncer(HLog.java:1247)
> >         at
> > org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1400)
> >         at
> >
> org.apache.hadoop.hbase.regionserver.wal.HLog$LogSyncer.run(HLog.java:1199)
> >         at java.lang.Thread.run(Thread.java:662)
> >
> > The logSyncer is always waiting on waitForAckedSeqno. All the HLog
> > operations seems blocked. Is this a bug? Or I missed some important
> > patches?
> >
> > Hope to get your suggestions soon.
> >
> > Best regards,
> > Mickey
> >
>

Mime
View raw message