hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-6649) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails
Date Tue, 28 Aug 2012 21:45:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443515#comment-13443515
] 

Ted Yu edited comment on HBASE-6649 at 8/29/12 8:43 AM:
--------------------------------------------------------

>From https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/118/testReport/org.apache.hadoop.hbase.replication/TestReplication/queueFailover/:
{code}
2012-08-28 17:29:54,404 DEBUG [main-EventThread] master.AssignmentManager(2911): based on
AM, current region=.META.,,1.1028785192 is on server=juno.apache.org,43891,1346174923071 server
being checked: juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,405 DEBUG [RegionServer:1;juno.apache.org,43891,1346174923071-EventThread]
zookeeper.ZooKeeperWatcher(266): regionserver:43891-0x1396e4723930005 Received ZooKeeper Event,
type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG [main-EventThread] master.ServerManager(394): Added=juno.apache.org,55977,1346174923023
to dead servers, submitted shutdown handler to be executed, root=false, meta=false
2012-08-28 17:29:54,406 DEBUG [main-EventThread] zookeeper.ZooKeeperWatcher(266): master:55418-0x1396e4723930003
Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG [RegionServer:1;juno.apache.org,43891,1346174923071-EventThread]
zookeeper.ZKUtil(229): regionserver:43891-0x1396e4723930005 Set watcher on existing znode
/1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,407 INFO  [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
handler.ServerShutdownHandler(175): Splitting logs for juno.apache.org,55977,1346174923023
...
2012-08-28 17:29:54,407 DEBUG [main-EventThread] zookeeper.ZKUtil(229): master:55418-0x1396e4723930003
Set watcher on existing znode /1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,410 DEBUG [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.MasterFileSystem(267): Renamed region directory: hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting
2012-08-28 17:29:54,410 INFO  [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(894): dead splitlog worker juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,413 DEBUG [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(246): Scheduling batch of logs to split
2012-08-28 17:29:54,414 INFO  [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(248): started splitting logs in [hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting]
...
2012-08-28 17:29:55,000 ERROR [IPC Server handler 7 on 59869] security.UserGroupInformation(1124):
PriviledgedActionException as:jenkins.hfs.0 cause:java.io.FileNotFoundException: Parent directory
doesn't exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
2012-08-28 17:29:55,004 FATAL [RegionServer:0;juno.apache.org,55977,1346174923023.logRoller]
regionserver.HRegionServer(1537): ABORTING region server juno.apache.org,55977,1346174923023:
IOE in log roller
java.io.IOException: cannot get log writer
	at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:715)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:662)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:594)
	at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.io.FileNotFoundException: java.io.FileNotFoundException:
Parent directory doesn't exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:712)
	... 4 more
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: Parent directory
doesn't exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3251)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713)
	at org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:198)
	at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:601)
	at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:442)
	at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
	... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.FileNotFoundException: Parent directory
doesn't exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1167)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1241)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1188)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:628)
	at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

	at org.apache.hadoop.ipc.Client.call(Client.java:1070)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
	at $Proxy8.create(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy8.create(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3248)
	... 13 more
{code}
It is clear that log splitting (splitLog() call on master) raced with log roller (on region
server).
In run() of log roller:
{code}
      } catch (IOException ex) {
{code}
One option is to distinguish FileNotFoundException from other IOE's and exit.
                
      was (Author: yuzhihong@gmail.com):
    From https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92-security/lastCompletedBuild/testReport/org.apache.hadoop.hbase.replication/TestReplication/queueFailover/:
{code}
2012-08-28 17:29:54,404 DEBUG [main-EventThread] master.AssignmentManager(2911): based on
AM, current region=.META.,,1.1028785192 is on server=juno.apache.org,43891,1346174923071 server
being checked: juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,405 DEBUG [RegionServer:1;juno.apache.org,43891,1346174923071-EventThread]
zookeeper.ZooKeeperWatcher(266): regionserver:43891-0x1396e4723930005 Received ZooKeeper Event,
type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG [main-EventThread] master.ServerManager(394): Added=juno.apache.org,55977,1346174923023
to dead servers, submitted shutdown handler to be executed, root=false, meta=false
2012-08-28 17:29:54,406 DEBUG [main-EventThread] zookeeper.ZooKeeperWatcher(266): master:55418-0x1396e4723930003
Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/1/rs
2012-08-28 17:29:54,406 DEBUG [RegionServer:1;juno.apache.org,43891,1346174923071-EventThread]
zookeeper.ZKUtil(229): regionserver:43891-0x1396e4723930005 Set watcher on existing znode
/1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,407 INFO  [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
handler.ServerShutdownHandler(175): Splitting logs for juno.apache.org,55977,1346174923023
...
2012-08-28 17:29:54,407 DEBUG [main-EventThread] zookeeper.ZKUtil(229): master:55418-0x1396e4723930003
Set watcher on existing znode /1/rs/juno.apache.org,43891,1346174923071
2012-08-28 17:29:54,410 DEBUG [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.MasterFileSystem(267): Renamed region directory: hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting
2012-08-28 17:29:54,410 INFO  [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(894): dead splitlog worker juno.apache.org,55977,1346174923023
2012-08-28 17:29:54,413 DEBUG [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(246): Scheduling batch of logs to split
2012-08-28 17:29:54,414 INFO  [MASTER_SERVER_OPERATIONS-juno.apache.org,55418,1346174922926-0]
master.SplitLogManager(248): started splitting logs in [hdfs://localhost:59869/user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023-splitting]
...
2012-08-28 17:29:55,000 ERROR [IPC Server handler 7 on 59869] security.UserGroupInformation(1124):
PriviledgedActionException as:jenkins.hfs.0 cause:java.io.FileNotFoundException: Parent directory
doesn't exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
2012-08-28 17:29:55,004 FATAL [RegionServer:0;juno.apache.org,55977,1346174923023.logRoller]
regionserver.HRegionServer(1537): ABORTING region server juno.apache.org,55977,1346174923023:
IOE in log roller
java.io.IOException: cannot get log writer
	at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:715)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:662)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:594)
	at org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: java.io.FileNotFoundException: java.io.FileNotFoundException:
Parent directory doesn't exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:106)
	at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:712)
	... 4 more
Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: Parent directory
doesn't exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3251)
	at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:713)
	at org.apache.hadoop.hdfs.DistributedFileSystem.createNonRecursive(DistributedFileSystem.java:198)
	at org.apache.hadoop.fs.FileSystem.createNonRecursive(FileSystem.java:601)
	at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:442)
	at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:87)
	... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.FileNotFoundException: Parent directory
doesn't exist: /user/jenkins/hbase/.logs/juno.apache.org,55977,1346174923023
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1167)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1241)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1188)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:628)
	at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

	at org.apache.hadoop.ipc.Client.call(Client.java:1070)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
	at $Proxy8.create(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy8.create(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:3248)
	... 13 more
{code}
It is clear that log splitting (splitLog() call on master) raced with log roller (on region
server).
In run() of log roller:
{code}
      } catch (IOException ex) {
{code}
One option is to distinguish FileNotFoundException from other IOE's and exit.
                  
> [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails
> ------------------------------------------------------------------
>
>                 Key: HBASE-6649
>                 URL: https://issues.apache.org/jira/browse/HBASE-6649
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Devaraj Das
>
> Have seen it twice in the recent past: http://bit.ly/MPCykB & http://bit.ly/O79Dq7
.. 
> Looking briefly at the logs hints at a pattern - in both the failed test instances, there
was an RS crash while the test was running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message