Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89FC410425 for ; Mon, 2 Mar 2015 19:45:05 +0000 (UTC) Received: (qmail 42264 invoked by uid 500); 2 Mar 2015 19:44:58 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 42122 invoked by uid 500); 2 Mar 2015 19:44:58 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 42112 invoked by uid 99); 2 Mar 2015 19:44:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Mar 2015 19:44:58 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chen.song.82@gmail.com designates 209.85.215.51 as permitted sender) Received: from [209.85.215.51] (HELO mail-la0-f51.google.com) (209.85.215.51) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Mar 2015 19:44:52 +0000 Received: by labgd6 with SMTP id gd6so32750901lab.8 for ; Mon, 02 Mar 2015 11:43:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=k4QgB/9/VPrZteaEuKaXcMqwkZSkWB+m/DBfREfOii0=; b=VjrG+S9vG9lCPo+REHR4LS9aytyBo4JW/kopPXSuqozqigPw+VWnmfmn9TqckK+z06 A878fwTFCVXb2LIhgwJaKXOqpVrDfGONsj3W9i7CHj1tHhlWM9GIGQvp5duBJmbcbIHt lSpq/laF23UVQV3IgCCvQJAk1TOGlITmWwPFBpxNbiwJl1f0B0zd47UvHa+uj1fqNsd6 m8HSHWzgqNyKODqG9Wo0gjesZhIX0ungd7zNK/t8YFwJ17W6NXTkS8pNduhGIJndxM64 DfSayj3byhhyGVEpcFzCuBkfWNnTRgqkfgd/A7c7Ht7JPJBh+qwlDRuoMFElkVAqJmIb DX2w== MIME-Version: 1.0 X-Received: by 10.152.207.72 with SMTP id lu8mr21077836lac.90.1425325426743; Mon, 02 Mar 2015 11:43:46 -0800 (PST) Received: by 10.25.91.207 with HTTP; Mon, 2 Mar 2015 11:43:46 -0800 (PST) In-Reply-To: References: Date: Mon, 2 Mar 2015 14:43:46 -0500 Message-ID: Subject: Re: how to catch exception when data cannot be replication to any datanode From: Chen Song To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11346650c028430510536f8a X-Virus-Checked: Checked by ClamAV on apache.org --001a11346650c028430510536f8a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Also, it could be thrown out in BlockManager but on DFSClient side, it just catch that exception and logs it as a warning. The problem here is that the caller has no way to detect this error and only see an empty file (0 bytes) after the fact. Chen On Mon, Mar 2, 2015 at 2:41 PM, Chen Song wrote: > I am using CDH5.1.0, which is hadoop 2.3.0. > > On Mon, Mar 2, 2015 at 12:23 PM, Ted Yu wrote: > >> Which hadoop release are you using ? >> >> In branch-2, I see this IOE in BlockManager : >> >> if (targets.length < minReplication) { >> throw new IOException("File " + src + " could only be replicated t= o >> " >> + targets.length + " nodes instead of minReplication (=3D" >> + minReplication + "). There are " >> >> Cheers >> >> On Mon, Mar 2, 2015 at 8:44 AM, Chen Song wrote= : >> >>> Hey >>> >>> I got the following error in the application logs when trying to put a >>> file to DFS. >>> >>> 015-02-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968 >>> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/i= mpbus.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb co= uld only be replicated to 0 nodes instead of minReplication (=3D1). There = are 317 datanode(s) running and no node(s) are excluded in this operation. >>> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.c= hooseTarget(BlockManager.java:1447) >>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAddit= ionalBlock(FSNamesystem.java:2703) >>> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.add= Block(NameNodeRpcServer.java:569) >>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ= erSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.ja= va:440) >>> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol= Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolPr= otos.java) >>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn= voker.call(ProtobufRpcEngine.java:585) >>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986) >>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:415) >>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro= upInformation.java:1554) >>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980) >>> >>> at org.apache.hadoop.ipc.Client.call(Client.java:1409) >>> at org.apache.hadoop.ipc.Client.call(Client.java:1362) >>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(Proto= bufRpcEngine.java:206) >>> at com.sun.proxy.$Proxy23.addBlock(Unknown Source) >>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTran= slatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362) >>> at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source= ) >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe= thodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMeth= od(RetryInvocationHandler.java:186) >>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Ret= ryInvocationHandler.java:102) >>> at com.sun.proxy.$Proxy24.addBlock(Unknown Source) >>> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFo= llowingBlock(DFSOutputStream.java:1438) >>> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBloc= kOutputStream(DFSOutputStream.java:1260) >>> at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSO= utputStream.java:525) >>> >>> >>> This results in empty file in HDFS. I did some search through this emai= l >>> thread and found that this could be caused by disk full, or data node >>> unreachable. >>> >>> However, this exception was only logged as WARN level when >>> FileSystem.close is called, and never thrown visible to client. My ques= tion >>> is, on the client level, How can I catch this exception and handle it? >>> >>> Chen >>> >>> -- >>> Chen Song >>> >>> >> > > > -- > Chen Song > > --=20 Chen Song --001a11346650c028430510536f8a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Also, it could be thrown out in BlockManager but on DFSCli= ent side, it just catch that exception and logs it as a warning.

The problem here is that the caller has no way to detect this erro= r and only see an empty file (0 bytes) after the fact.

=
Chen

On Mon, Mar 2, 2015 at 2:41 PM, Chen Song <chen.song.82@gmail.com= > wrote:
I= am using CDH5.1.0, which is hadoop=C2=A02.3.0.

On Mon, Mar 2, 2= 015 at 12:23 PM, Ted Yu <yuzhihong@gmail.com> wrote:
Which hadoop release are you = using ?

In branch-2, I see this IOE in=C2=A0BlockManager= :

=C2=A0 =C2=A0 if (targets.length < minReplica= tion) {
=C2=A0 =C2=A0 =C2=A0 throw new IOException("File &qu= ot; + src + " could only be replicated to "
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 + targets.length + " nodes instead of minRepl= ication (=3D"
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 + minReplic= ation + ").=C2=A0 There are "

Cheers

On Mon, Mar 2, 2015 at 8:44 AM, Chen Song &l= t;chen.song.82@= gmail.com> wrote:
Hey

I got the following error in the applicatio= n logs when trying to put a file to DFS.

015-02=
-27 19:42:01 DFSClient [ERROR] Failed to close inode 559475968
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/impbu=
s.log_impbus_view.v001.2015022719.T07-431672015022719385410197.pb.pb could onl=
y be replicated to 0 nodes instead of minReplication (=3D1).  There are 317=
 datanode(s) running and no node(s) are excluded in this operation.
        at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.choos=
eTarget(BlockManager.java:1447)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditiona=
lBlock(FSNamesystem.java:2703)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBloc=
k(NameNodeRpcServer.java:569)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSi=
deTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:4=
40)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProt=
os$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos=
.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoke=
r.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1986)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1982)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupIn=
formation.java:1554)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1980)

        at org.apache.hadoop.ipc.Client.call(Client.java:1409)
        at org.apache.hadoop.ipc.Client.call(Client.java:1362)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufR=
pcEngine.java:206)
        at com.sun.proxy.$Proxy23.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslat=
orPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:362)
        at sun.reflect.GeneratedMethodAccessor361.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethod=
AccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(R=
etryInvocationHandler.java:186)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryIn=
vocationHandler.java:102)
        at com.sun.proxy.$Proxy24.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollow=
ingBlock(DFSOutputStream.java:1438)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOut=
putStream(DFSOutputStream.java:1260)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutpu=
tStream.java:525)

This results in empty file in HD= FS. I did some search through this email thread and found that this could b= e caused by disk full, or data node unreachable.

H= owever, this exception was only logged as WARN level when FileSystem.close = is called, and never thrown visible to client. My question is, on the clien= t level, How can I catch this exception and handle it?

Chen

--
Che= n Song





<= /div>--
Chen Song




--
Chen Song

--001a11346650c028430510536f8a--