hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8531) TestZooKeeper fails in trunk/0.95 builds
Date Tue, 14 May 2013 14:35:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13657087#comment-13657087
] 

stack commented on HBASE-8531:
------------------------------

hadoopqa looks horked:

{code}
Started by user stack
Building remotely on hadoop2 in workspace /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build
hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build
at hudson.remoting.Channel@51ad9cd5:hadoop2
	at hudson.FilePath.act(FilePath.java:861)
	at hudson.FilePath.act(FilePath.java:838)
	at hudson.FilePath.mkdirs(FilePath.java:978)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1329)
	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:682)
	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:587)
	at hudson.model.Run.execute(Run.java:1568)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:236)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.remoting.Channel.send(Channel.java:494)
	at hudson.remoting.Request.call(Request.java:129)
	at hudson.remoting.Channel.call(Channel.java:672)
	at hudson.FilePath.act(FilePath.java:854)
	... 10 more
Caused by: java.io.IOException
	at hudson.remoting.Channel.close(Channel.java:910)
	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
	at hudson.remoting.PingThread.ping(PingThread.java:120)
	at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1368466219363 hasn't completed
at 1368466459363
	... 2 more
Retrying after 10 seconds
hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build
at hudson.remoting.Channel@51ad9cd5:hadoop2
	at hudson.FilePath.act(FilePath.java:861)
	at hudson.FilePath.act(FilePath.java:838)
	at hudson.FilePath.mkdirs(FilePath.java:978)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1329)
	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:682)
	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:587)
	at hudson.model.Run.execute(Run.java:1568)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:236)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.remoting.Channel.send(Channel.java:494)
	at hudson.remoting.Request.call(Request.java:129)
	at hudson.remoting.Channel.call(Channel.java:672)
	at hudson.FilePath.act(FilePath.java:854)
	... 10 more
Caused by: java.io.IOException
	at hudson.remoting.Channel.close(Channel.java:910)
	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
	at hudson.remoting.PingThread.ping(PingThread.java:120)
	at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1368466219363 hasn't completed
at 1368466459363
	... 2 more
Retrying after 10 seconds
hudson.util.IOException2: remote file operation failed: /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build
at hudson.remoting.Channel@51ad9cd5:hadoop2
	at hudson.FilePath.act(FilePath.java:861)
	at hudson.FilePath.act(FilePath.java:838)
	at hudson.FilePath.mkdirs(FilePath.java:978)
	at hudson.model.AbstractProject.checkout(AbstractProject.java:1329)
	at hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:682)
	at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:88)
	at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:587)
	at hudson.model.Run.execute(Run.java:1568)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:236)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.remoting.Channel.send(Channel.java:494)
	at hudson.remoting.Request.call(Request.java:129)
	at hudson.remoting.Channel.call(Channel.java:672)
	at hudson.FilePath.act(FilePath.java:854)
	... 10 more
Caused by: java.io.IOException
	at hudson.remoting.Channel.close(Channel.java:910)
	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
	at hudson.remoting.PingThread.ping(PingThread.java:120)
	at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1368466219363 hasn't completed
at 1368466459363
	... 2 more
Archiving artifacts
ERROR: Failed to archive artifacts: trunk/patchprocess/*,**/surefire-reports/*,**/site/*,**/*.txt,**/org.apache.hadoop.mapred.MiniMRCluster*/*
hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.remoting.Channel.send(Channel.java:494)
	at hudson.remoting.Request.call(Request.java:129)
	at hudson.remoting.Channel.call(Channel.java:672)
	at hudson.EnvVars.getRemote(EnvVars.java:212)
	at hudson.model.Computer.getEnvironment(Computer.java:882)
	at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:28)
	at hudson.model.Run.getEnvironment(Run.java:2021)
	at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:936)
	at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:115)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:810)
	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:785)
	at hudson.model.Build$BuildExecution.post2(Build.java:183)
	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:732)
	at hudson.model.Run.execute(Run.java:1593)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:236)
Caused by: java.io.IOException
	at hudson.remoting.Channel.close(Channel.java:910)
	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
	at hudson.remoting.PingThread.ping(PingThread.java:120)
	at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1368466219363 hasn't completed
at 1368466459363
	... 2 more
Recording test results
ERROR: Publisher hudson.tasks.junit.JUnitResultArchiver aborted due to exception
hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.remoting.Channel.send(Channel.java:494)
	at hudson.remoting.Request.call(Request.java:129)
	at hudson.remoting.Channel.call(Channel.java:672)
	at hudson.EnvVars.getRemote(EnvVars.java:212)
	at hudson.model.Computer.getEnvironment(Computer.java:882)
	at jenkins.model.CoreEnvironmentContributor.buildEnvironmentFor(CoreEnvironmentContributor.java:28)
	at hudson.model.Run.getEnvironment(Run.java:2021)
	at hudson.model.AbstractBuild.getEnvironment(AbstractBuild.java:936)
	at hudson.tasks.junit.JUnitResultArchiver.perform(JUnitResultArchiver.java:131)
	at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:19)
	at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:810)
	at hudson.model.AbstractBuild$AbstractBuildExecution.performAllBuildSteps(AbstractBuild.java:785)
	at hudson.model.Build$BuildExecution.post2(Build.java:183)
	at hudson.model.AbstractBuild$AbstractBuildExecution.post(AbstractBuild.java:732)
	at hudson.model.Run.execute(Run.java:1593)
	at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
	at hudson.model.ResourceController.execute(ResourceController.java:88)
	at hudson.model.Executor.run(Executor.java:236)
Caused by: java.io.IOException
	at hudson.remoting.Channel.close(Channel.java:910)
	at hudson.slaves.ChannelPinger$1.onDead(ChannelPinger.java:110)
	at hudson.remoting.PingThread.ping(PingThread.java:120)
	at hudson.remoting.PingThread.run(PingThread.java:81)
Caused by: java.util.concurrent.TimeoutException: Ping started on 1368466219363 hasn't completed
at 1368466459363
	... 2 more
[description-setter] Could not determine description.
Finished: FAILURE

{code}
                
> TestZooKeeper fails in trunk/0.95 builds
> ----------------------------------------
>
>                 Key: HBASE-8531
>                 URL: https://issues.apache.org/jira/browse/HBASE-8531
>             Project: HBase
>          Issue Type: Bug
>          Components: Zookeeper
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.95.1
>
>         Attachments: 8531.txt, 8531v4.txt, 8531v5.txt, 8531v6.txt
>
>
> TestZooKeeper fails on occasion.  I caught a good example recently.  See below failure
stack trace.
> It took me a while.  I thought the issue had to do w/ our recent ipc refactorings but
it looks like a problem we have always had.  In short, MetaScanner is not handling DoNotRetryIOEs
-- it is letting them out.  DNRIOEs when scanning are supposed to force a reset of the scan.
 HTable#next catches these and does the necessary scanner reset up.  MetaScanner is running
some subset of what HTable does when it is scanning except the part where it catches a DNRIOE
and redoes the scan.  Odd.
> TestZooKeeper failed in this instance because the test kills a regionserver at same time
as we are trying to create a table.  In create table we do a meta scan using MetaScanner passing
a Visitor.  The scan starts and gets a RegionServerStoppedException (This is NOT a DNRIOE
-- it should be -- but later we convert it into one up in ScannerCallable).
> DNRIOEs are thrown to the upper layers to handle....
> Let me look into having MetaScanner just use HTable scanning.  It makes an instance just
to find where to start the scan... let me try using this instance for actually scanning.
> TODO: Do this convertion everywhere a DNRIOE could come out.
> Here is the stack trace
> {code}
> org.apache.hadoop.hbase.exceptions.DoNotRetryIOException: Reset scanner
> 	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:209)
> 	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:52)
> 	at org.apache.hadoop.hbase.client.ServerCallable.withRetries(ServerCallable.java:170)
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:212)
> 	at org.apache.hadoop.hbase.client.MetaScanner.access$000(MetaScanner.java:52)
> 	at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:131)
> 	at org.apache.hadoop.hbase.client.MetaScanner$1.connect(MetaScanner.java:128)
> 	at org.apache.hadoop.hbase.client.HConnectionManager.execute(HConnectionManager.java:398)
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:128)
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:103)
> 	at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:81)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:448)
> 	at org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:348)
> 	at org.apache.hadoop.hbase.TestZooKeeper.testSanity(TestZooKeeper.java:242)
> 	at org.apache.hadoop.hbase.TestZooKeeper.testRegionServerSessionExpired(TestZooKeeper.java:203)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> 	at org.junit.runners.Suite.runChild(Suite.java:127)
> 	at org.junit.runners.Suite.runChild(Suite.java:26)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.exceptions.RegionServerStoppedException: org.apache.hadoop.hbase.exceptions.RegionServerStoppedException:
Server p0116.mtv.cloudera.com,60679,1368057284663 not running, aborting
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
> 	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:79)
> 	at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:227)
> 	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:175)
> 	... 43 more
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: org.apache.hadoop.hbase.exceptions.RegionServerStoppedException:
Server p0116.mtv.cloudera.com,60679,1368057284663 not running, aborting
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2310)
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:2874)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20577)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2103)
> 	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1810)
> 	at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1336)
> 	at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1532)
> 	at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1587)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.scan(ClientProtos.java:21012)
> 	at org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:147)
> 	... 43 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message