hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10504) DFSClient filesBeingWritten memory leak when client gets RemoteException - could only be replicated to 0 nodes instead of minReplication (=1)
Date Thu, 09 Jun 2016 16:24:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322797#comment-15322797
] 

Arpit Agarwal commented on HDFS-10504:
--------------------------------------

Hi [~sebyonthenet], I haven't got a chance to look into what you described, but yes please
create another Jira. Also please mention your Apache Hadoop version and the output of the
'hadoop version' command for completeness.

> DFSClient filesBeingWritten memory leak when client gets RemoteException - could only
be replicated to 0 nodes instead of minReplication (=1)
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10504
>                 URL: https://issues.apache.org/jira/browse/HDFS-10504
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.7.2
>         Environment: linux
>            Reporter: Seb Mo
>
> I'm trying to migrate data from nfs to hdfs. I have about 2million files with small sizes.
That takes about 4 hours in my env, but I randomly get an exception during migration. Got
12 of those during the test (stack below). 
> Now when I'm getting the exception, I'm doing a sleep for one second, after I check if
the file is there (api says yes, but it's reported size is zero bytes). So I'm removing the
file, then start writing it again and at that point it succeeds. 
> Here is the stack:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File xxx/xxx/xxx could only
be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1592)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3158)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3082)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:822)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
> 	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:497)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1459)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1255)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
> When I write I'm using the try with resource which should call close method on the FSDataOutputStream.
This triggers the 
> dfsClient.endFileLease(fileId) to be called which should remove the ref from:
> DFSClient:
> synchronized(filesBeingWritten) {
>       filesBeingWritten.remove(inodeId);
>       if (filesBeingWritten.isEmpty()) {
>         lastLeaseRenewal = 0;
>       }
>     }
> But when the process finishes, I get:
> 2016-06-07 22:26:54,734 - ERROR [Thread-3] (DFSClient.closeAllFilesBeingWritten:940)
- Failed to close inode 1675022
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /xxx/xxx/xxx could only
be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running
and 1 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1592)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3158)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3082)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:822)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200)
> Now, when there is no space on the datanode, I get this error a lot which causes my migration
java client to die with OutOfMemory. The cause is DFSClient.filesBeingWritten taking almost
1GB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message