accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3895) Accumulo init can fail halfway through
Date Tue, 09 Jun 2015 15:36:00 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579095#comment-14579095
] 

Josh Elser commented on ACCUMULO-3895:
--------------------------------------

bq. This looks suspiciously similar to the "out of space" DFS issue that we've seen before.

Well, that is one of the reasons that our HDFS operations may fail during init. The issue
is how to deal with this in a way that subsequent inits won't think that Accumulo was already
initialized if the previous attempt failed.

> Accumulo init can fail halfway through
> --------------------------------------
>
>                 Key: ACCUMULO-3895
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3895
>             Project: Accumulo
>          Issue Type: Bug
>            Reporter: Billie Rinaldi
>            Priority: Critical
>
> I saw a situation where "accumulo init" exited with error code 255, the HDFS directories
were successfully created, and security was not initialized.  The contents of accumulo-init.out
were the following.  I realize no good can come of running init when there are no DataNodes,
but it would be nice if init cleaned up after itself when it fails.
> {noformat}
> 2015-06-08 23:19:17,930 [fs.VolumeManagerImpl] WARN : dfs.datanode.synconclose set to
false in hdfs-site.xml: data loss is possible on hard system reset or power loss
> 2015-06-08 23:19:17,932 [init.Initialize] INFO : Hadoop Filesystem is hdfs://c6401.ambari.apache.org:8020
> 2015-06-08 23:19:17,933 [init.Initialize] INFO : Accumulo data dirs are [hdfs://c6401.ambari.apache.org:8020/apps/accumulo/data]
> 2015-06-08 23:19:17,933 [init.Initialize] INFO : Zookeeper server is c6401.ambari.apache.org:2181
> 2015-06-08 23:19:17,933 [init.Initialize] INFO : Checking if Zookeeper is available.
If this hangs, then you need to make sure zookeeper is running
> Enter initial password for root (this may not be applicable for your security setup):
******
> Confirm initial password for root: ******
> 2015-06-08 23:19:18,661 [Configuration.deprecation] INFO : dfs.replication.min is deprecated.
Instead, use dfs.namenode.replication.min
> 2015-06-08 23:19:18,944 [Configuration.deprecation] INFO : dfs.block.size is deprecated.
Instead, use dfs.blocksize
> 2015-06-08 23:19:19,154 [hdfs.DFSClient] INFO : Exception in createBlockOutputStream
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> 	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> 	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
> 	at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1575)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1317)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1270)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464)
> 2015-06-08 23:19:19,158 [hdfs.DFSClient] INFO : Abandoning BP-431548639-192.168.64.101-1433805216294:blk_1073741834_1010
> 2015-06-08 23:19:19,172 [hdfs.DFSClient] INFO : Excluding datanode DatanodeInfoWithStorage[192.168.64.101:50010,DS-3861ea49-69b6-4bc9-bcba-c81ed9585b51,DISK]
> 2015-06-08 23:19:19,205 [hdfs.DFSClient] WARN : DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/accumulo/data/tables/!0/table_info/0_1.rf
could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s)
running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1551)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3104)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3028)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1463)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1259)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464)
> 2015-06-08 23:19:19,207 [init.Initialize] ERROR: FATAL Failed to initialize filesystem
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/accumulo/data/tables/!0/table_info/0_1.rf
could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s)
running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1551)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3104)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3028)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1463)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1259)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464)
> 2015-06-08 23:19:19,231 [hdfs.DFSClient] ERROR: Failed to close inode 16441
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /apps/accumulo/data/tables/!0/table_info/0_1.rf
could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s)
running and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1551)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3104)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3028)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2081)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2077)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2075)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy15.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy16.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1463)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1259)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:464)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message