hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4832) TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
Date Tue, 22 Nov 2011 00:25:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154766#comment-13154766
] 

Ted Yu commented on HBASE-4832:
-------------------------------

The test failures were due to 'Too many open files'
                
> TestRegionServerCoprocessorExceptionWithAbort fails if the region server stops too fast
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-4832
>                 URL: https://issues.apache.org/jira/browse/HBASE-4832
>             Project: HBase
>          Issue Type: Bug
>          Components: coprocessors, test
>    Affects Versions: 0.94.0
>            Reporter: nkeywal
>            Assignee: Eugene Koontz
>            Priority: Minor
>         Attachments: 4832-timeout.txt, 4832_trunk_hregionserver.patch, HBASE-4832.patch,
HBASE-4832.patch
>
>
> The current implementation of HRegionServer#stop is
> {noformat}
>   public void stop(final String msg) {
>     this.stopped = true;
>     LOG.info("STOPPED: " + msg);
>     synchronized (this) {
>       // Wakes run() if it is sleeping
>       notifyAll(); // FindBugs NN_NAKED_NOTIFY
>     }
>   }
> {noformat}
> The notification is sent on the wrong object and does nothing. As a consequence, the
region server continues to sleep instead of waking up and stopping immediately. A correct
implementation is:
> {noformat}
>   public void stop(final String msg) {
>     this.stopped = true;
>     LOG.info("STOPPED: " + msg);
>     // Wakes run() if it is sleeping
>     sleeper.skipSleepCycle();
>   }
> {noformat}
> Then the region server stops immediately. This makes the region server stops 0,5s faster
on average, which is quite useful for unit tests.
> However, with this fix, TestRegionServerCoprocessorExceptionWithAbort does not work.
> It likely because the code does no expect the region server to stop that fast.
> The exception is:
> {noformat}
> testExceptionFromCoprocessorDuringPut(org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort)
 Time elapsed: 30.06 sec  <<< ERROR!
> java.lang.Exception: test timed out after 30000 milliseconds
> 	at java.lang.Throwable.fillInStackTrace(Native Method)
> 	at java.lang.Throwable.<init>(Throwable.java:196)
> 	at java.lang.Exception.<init>(Exception.java:41)
> 	at java.lang.InterruptedException.<init>(InterruptedException.java:48)
> 	at java.lang.Thread.sleep(Native Method)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1019)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:804)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:778)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:697)
> 	at org.apache.hadoop.hbase.client.ServerCallable.connect(ServerCallable.java:75)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1280)
> 	at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:585)
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:154)
> 	at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
> 	at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:130)
> 	at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:127)
> 	at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:357)
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:127)
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:866)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:920)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:808)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1469)
> 	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1354)
> 	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:892)
> 	at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:750)
> 	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:725)
> 	at org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.testExceptionFromCoprocessorDuringPut(TestRegionServerCoprocessorExceptionWithAbort.java:84)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
> 	at org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:62)
> {noformat}
> We have this exception because we entered a loop of retries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message