accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3895) Accumulo init can fail halfway through
Date Tue, 16 Jun 2015 15:17:06 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588189#comment-14588189
] 

Christopher Tubbs commented on ACCUMULO-3895:
---------------------------------------------

Doesn't really leave the Accumulo cluster in an unrecoverable state... since that doesn't
exist yet (indeed, that's what the user is trying to accomplish when this is encountered).
I think it's a major bug, but leaving it critical reduces the utility of that priority when
used in JIRA searches for identifying serious (non-blocker) problems which might involve data
loss or manual recovery on a running system.

I come across this JIRA frequently when searching for important issues, and my reaction is
always the same "oh! this is critical! <read issue> oh, that's not that important. next..."
I don't really care whether it's marked Major or Critical, but I certainly haven't been treating
it as Critical when I do JIRA triage. My suspicion is that others haven't been either.

> Accumulo init can fail halfway through
> --------------------------------------
>
>                 Key: ACCUMULO-3895
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3895
>             Project: Accumulo
>          Issue Type: Bug
>            Reporter: Billie Rinaldi
>            Priority: Critical
>             Fix For: 1.8.0
>
>
> I saw a situation where "accumulo init" exited with error code 255, the HDFS directories
were successfully created, and security was not initialized.  The contents of accumulo-init.out
were the following.  I realize no good can come of running init when there are no DataNodes,
but it would be nice if init cleaned up after itself when it fails.
> {noformat}
> 2015-06-08 23:19:17,930 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to
false in hdfs-site.xml: data loss is possible on hard system reset or power loss
> 2015-06-08 23:19:17,932 [init.Initialize] INFO : Hadoop Filesystem is hdfs://c6401.ambari.apache.org:8020
> 2015-06-08 23:19:17,933 [init.Initialize] INFO : Accumulo data dirs are [hdfs://c6401.ambari.apache.org:8020/apps/accumulo/data]
> 2015-06-08 23:19:17,933 [init.Initialize] INFO : Zookeeper server is c6401.ambari.apache.org:2181
> 2015-06-08 23:19:17,933 [init.Initialize] INFO : Checking if Zookeeper is available.
If this hangs, then you need to make sure zookeeper is running
> Enter initial password for root (this may not be applicable for your security setup):
******
> Confirm initial password for root: ******
> 2015-06-08 23:19:18,661 [Configuration.deprecation] INFO : dfs.replication.min is deprecated.
Instead, use dfs.namenode.replication.min
> 2015-06-08 23:19:18,944 [Configuration.deprecation] INFO : dfs.block.size is deprecated.
Instead, use dfs.blocksize
> 2015-06-08 23:19:19,154 [hdfs.DFSClient] INFO : Exception in createBlockOutputStream
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1575)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1317)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1270)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464)
> 2015-06-08 23:19:19,158 [hdfs.DFSClient] INFO : Abandoning BP-431548639-192.168.64.101-1433805216294:blk_1073741834_1010
> 2015-06-08 23:19:19,172 [hdfs.DFSClient] INFO : Excluding datanode DatanodeInfoWithStorage[192.168.64.101:50010,DS-3861ea49-69b6-4bc9-bcba-c81ed9585b51,DISK]
> 2015-06-08 23:19:19,205 [hdfs.DFSClient] WARN : DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/accumulo/data/tables/!0/table_info/0_1.rf
could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s)
running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1551)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3104)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3028)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1463)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1259)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464)
> 2015-06-08 23:19:19,207 [init.Initialize] ERROR: FATAL Failed to initialize filesystem
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/accumulo/data/tables/!0/table_info/0_1.rf
could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s)
running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1551)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3104)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3028)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1463)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1259)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464)
> 2015-06-08 23:19:19,231 [hdfs.DFSClient] ERROR: Failed to close inode 16441
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/accumulo/data/tables/!0/table_info/0_1.rf
could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s)
running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1551)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3104)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3028)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1463)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1259)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message