hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Helmling <ghelml...@gmail.com>
Subject Re: TestHLog hanging in trunk
Date Wed, 05 Oct 2011 22:26:26 GMT
On Wed, Oct 5, 2011 at 3:05 PM, Stack <stack@duboce.net> wrote:

> Easy to repro Gary?  I just add in your above patch and then loop?
> The first run completely goes down?  The regionserver is waiting on
> all regions to close before it can go down (and then the master go
> down).
> St.Ack
>


This patch isn't sufficient, unfortunately.  It seemed to work against the
HBASE-4209 commit, but clustter shutdown still hangs on current trunk.
Still waiting for regionservers to go down.

Without the patch you get the shutdown hook thread waiting to join the RS's:

"main" prio=10 tid=0x00000000403f1800 nid=0x2701 in Object.wait()
[0x00007f34067c8000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000abf5b438> (a
org.apache.hadoop.hbase.regionserver.ShutdownHook$ShutdownHookThread)
    at java.lang.Thread.join(Thread.java:1186)
    - locked <0x00000000abf5b438> (a
org.apache.hadoop.hbase.regionserver.ShutdownHook$ShutdownHookThread)
    at java.lang.Thread.join(Thread.java:1239)
    at
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79)


With the patch it just shifts this to JVMClusterUtil.shutdown() waiting to
join the RS threads:

"main" prio=10 tid=0x00000000401ec800 nid=0x153 in Object.wait()
[0x00007fb76a918000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    - waiting on <0x00000000acc60aa8> (a
org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread)
    at java.lang.Thread.join(Thread.java:1186)
    - locked <0x00000000acc60aa8> (a
org.apache.hadoop.hbase.util.JVMClusterUtil$RegionServerThread)
    at java.lang.Thread.join(Thread.java:1239)
    at
org.apache.hadoop.hbase.util.JVMClusterUtil.shutdown(JVMClusterUtil.java:230)


Any idea what's causing the region closing to hang?  The surefire plugin
doesn't seem to write the TestHLog-output.txt file if the test is hanging.
Any way to force it to do so?  Kind of useless.



>
> On Wed, Oct 5, 2011 at 12:13 PM, Gary Helmling <ghelmling@gmail.com>
> wrote:
> > Something else seems to be going on.  With the call to
> shutdownMiniCluster()
> > the first run of TestHLog passes.  But when I try running in a loop, the
> > second run always seems to hang.
> >
> > Thread dump here: http://pastebin.com/f18Wfa3T
> >
> >
> > On Wed, Oct 5, 2011 at 12:00 PM, Todd Lipcon <todd@cloudera.com> wrote:
> >
> >> +CC Roman who worked on the patch identified by the bisect.
> >>
> >> Roman, does Gary's analysis make sense to you?
> >>
> >> -Todd
> >>
> >> On Wed, Oct 5, 2011 at 11:55 AM, Gary Helmling <ghelmling@gmail.com>
> >> wrote:
> >> > Somehow TestHLog was never actually shutting down the mini-cluster?
> >> >
> >> > The following change lets the test exit successfully:
> >> >
> >> > diff --git
> >> > a/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
> >> > b/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
> >> > index 663b318..13f821c 100644
> >> > ---
> >> a/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
> >> > +++
> >> b/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
> >> > @@ -54,6 +54,7 @@ import
> >> > org.apache.hadoop.hdfs.server.namenode.LeaseManager;
> >> >  import org.apache.hadoop.io.SequenceFile;
> >> >  import org.apache.log4j.Level;
> >> >  import org.junit.After;
> >> > +import org.junit.AfterClass;
> >> >  import org.junit.Before;
> >> >  import org.junit.BeforeClass;
> >> >  import org.junit.Test;
> >> > @@ -120,6 +121,11 @@ public class TestHLog  {
> >> >     oldLogDir = new Path(hbaseDir, ".oldlogs");
> >> >     dir = new Path(hbaseDir, getName());
> >> >   }
> >> > +  @AfterClass
> >> > +  public static void tearDownAfterClass() throws Exception {
> >> > +    TEST_UTIL.shutdownMiniCluster();
> >> > +  }
> >> > +
> >> >   private static String getName() {
> >> >     // TODO Auto-generated method stub
> >> >     return "TestHLog";
> >> >
> >> >
> >> > On Wed, Oct 5, 2011 at 11:23 AM, Gary Helmling <ghelmling@gmail.com>
> >> wrote:
> >> >
> >> >> I've noticed that TestHLog is currently hanging in trunk (haven't
> >> checked
> >> >> other branches).  Oddly the tests actually complete, but then the
> test
> >> hangs
> >> >> in teardown.
> >> >>
> >> >> Seems to be something in the server shutdown hooks.  git bisect
> tracks
> >> down
> >> >> the hang to this commit:
> >> >>
> >> >> commit 9c195c7ef350a932a9901a2069b96694d202c675
> >> >> Author: Michael Stack <stack@apache.org>
> >> >> Date:   Fri Sep 30 21:45:20 2011 +0000
> >> >>
> >> >>     HBASE-4209 The HBase hbase-daemon.sh SIGKILLs master when
> stopping
> >> it
> >> >>
> >> >>     git-svn-id:
> >>
> https://svn.apache.org/repos/asf/hbase/trunk@117784913f79535-47bb-0310-9956-ffa450edef68
> >> >>
> >> >>
> >> >> Anyone else noticed this on TestHLog or other tests?  I think it may
> be
> >> >> behind some of our odd test cleanup issues up in Jenkins.
> >> >>
> >> >> --gh
> >> >>
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message