hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Song <chen.song...@gmail.com>
Subject Re: how to catch exception when data cannot be replication to any datanode
Date Mon, 02 Mar 2015 19:43:46 GMT
Also, it could be thrown out in BlockManager but on DFSClient side, it just
catch that exception and logs it as a warning.

The problem here is that the caller has no way to detect this error and
only see an empty file (0 bytes) after the fact.

Chen

On Mon, Mar 2, 2015 at 2:41 PM, Chen Song <chen.song.82@gmail.com> wrote:

> I am using CDH5.1.0, which is hadoop 2.3.0.
>
> On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Which hadoop release are you using ?
>>
>> In branch-2, I see this IOE in BlockManager :
>>
>>     if (targets.length < minReplication) {
>>       throw new IOException("File " + src + " could only be replicated to
>> "
>>           + targets.length + " nodes instead of minReplication (="
>>           + minReplication + ").  There are "
>>
>> Cheers
>>
>> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song <chen.song.82@gmail.com> wrote:
>>
>>> Hey
>>>
>>> I got the following error in the application logs when trying to put a
>>> file to DFS.
>>>
>>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
>>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb
could only be replicated to 0 nodes instead of minReplication (=1).  There are 317 datanode(s)
running and no node(s) are excluded in this operation.
>>>         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1447)
>>>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2703)
>>>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:569)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:440)
>>>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
>>>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:415)
>>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)
>>>
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>>>         at org.apache.hadoop.ipc.Client.call(Client.java:1362)
>>>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>>>         at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
>>>         at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>>>         at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1438)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
>>>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)
>>>
>>>
>>> This results in empty file in HDFS. I did some search through this email
>>> thread and found that this could be caused by disk full, or data node
>>> unreachable.
>>>
>>> However, this exception was only logged as WARN level when
>>> FileSystem.close is called, and never thrown visible to client. My question
>>> is, on the client level, How can I catch this exception and handle it?
>>>
>>> Chen
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>
>
> --
> Chen Song
>
>


-- 
Chen Song

Mime
View raw message