Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2EA0BDBE8 for ; Thu, 10 Jan 2013 17:18:01 +0000 (UTC) Received: (qmail 75517 invoked by uid 500); 10 Jan 2013 17:17:56 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 75402 invoked by uid 500); 10 Jan 2013 17:17:56 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 75395 invoked by uid 99); 10 Jan 2013 17:17:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jan 2013 17:17:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of rmolina@hortonworks.com designates 209.85.219.43 as permitted sender) Received: from [209.85.219.43] (HELO mail-oa0-f43.google.com) (209.85.219.43) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jan 2013 17:17:51 +0000 Received: by mail-oa0-f43.google.com with SMTP id k1so852430oag.16 for ; Thu, 10 Jan 2013 09:17:30 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=yU0bHd4YFOcenUbne6dmimOQX3Bz+mh6za2zqi8S6J0=; b=fKWYcaxj03CDtRPoKAy5agRgwyKiAtA82bwfjuytAZkJBD6L4x4QEsCDe3Q4Wx1GFr JQ05JsJQKE9g5P6VJ8EDLkaZylYKQHau6CXsY2XigByNZq0EPX8NVz5PktsHE+N6CqEj NTdnW9a9kyOv4eRAwegsb1mSiE+i55l/R9ODAXl3LSuUPUNJdLAq2sgWeVvKABOQiMvj C02F7B1cFQvyeZTCpf0Se7CJQrS6G2+IfIbRmGYio8Hk3lEu9i1cMqrJPTleJ69LsU72 kmADhWCc6bCzMn+98O+GLQIf3fLDba5DIjDA/Di4q5qqEK9CAwTpz3DBvjiIQNwwhzDV 8mIQ== MIME-Version: 1.0 Received: by 10.60.1.168 with SMTP id 8mr23099349oen.46.1357838250281; Thu, 10 Jan 2013 09:17:30 -0800 (PST) Received: by 10.76.27.36 with HTTP; Thu, 10 Jan 2013 09:17:30 -0800 (PST) In-Reply-To: References: Date: Thu, 10 Jan 2013 09:17:30 -0800 Message-ID: Subject: Re: could only be replicated to 0 nodes instead of minReplication From: Robert Molina To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8fb1f2aa920b2004d2f25a40 X-Gm-Message-State: ALoCoQntMk81cqBqzqKWieJ6TS/GF/EUF8TMj95wChvQ1JjL0ufe9Kk6uQNKhQIQ2gi9BdCKBY96 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb1f2aa920b2004d2f25a40 Content-Type: text/plain; charset=ISO-8859-1 Hi Ivan, Here are a couple of more suggestions provided by the wiki: http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo Regards, Robert On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov wrote: > I also found following exception in datanode, I suppose it might give some > clue: > > 2013-01-10 11:37:55,397 ERROR > org.apache.hadoop.hdfs.server.datanode.DataNode: > node02.303net.pvt:50010:DataXceiver error processing READ_BLOCK operation > src: /192.168.1.112:35991 dest: /192.168.1.112:50010 > java.net.SocketTimeoutException: 480000 millis timeout while waiting for > channel to be ready for write. ch : > java.nio.channels.SocketChannel[connected local=/192.168.1.112:50010remote=/ > 192.168.1.112:35991] > at > org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:247) > at > org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166) > at > org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:214) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:492) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:280) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:88) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219) > at java.lang.Thread.run(Thread.java:662) > > > On Thu, Jan 10, 2013 at 4:04 PM, Ivan Tretyakov < > itretyakov@griddynamics.com> wrote: > >> Hello! >> >> On our cluster jobs fails with the following exception: >> >> 2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient: >> DataStreamer Exception >> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File >> /user/persona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_000001_1/s/375ee510bbf44815b151df556e06b5ca >> could only be replicated to 0 nodes instead of minReplication (=1). There >> are 6 datanode(s) running and no node(s) are excluded in this operation. >> at >> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1322) >> at >> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2170) >> at >> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:471) >> at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297) >> at >> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080) >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1693) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1689) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1687) >> >> at org.apache.hadoop.ipc.Client.call(Client.java:1160) >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) >> at $Proxy10.addBlock(Unknown Source) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) >> at >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) >> at $Proxy10.addBlock(Unknown Source) >> at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1150) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1003) >> at >> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) >> >> I've found that it could be cause by lack of free disk space, but as I >> could see there is everything well (see attached dfs report output). >> Also, I could see following exception in TaskTracker log >> https://issues.apache.org/jira/browse/MAPREDUCE-5 but I'm not sure if it >> is related. >> >> Could it be related with another issue on our cluster? - >> http://mail-archives.apache.org/mod_mbox/hadoop-user/201301.mbox/%3CCAEAKFL90ReOWEvY_vuSMqU2GwMOAh0fndA9b-uodXZ6BYvz2Kg%40mail.gmail.com%3E >> >> Thanks in advance! >> >> -- >> Best Regards >> Ivan Tretyakov >> > > > > -- > Best Regards > Ivan Tretyakov > > Deployment Engineer > Grid Dynamics > +7 812 640 38 76 > Skype: ivan.tretyakov > www.griddynamics.com > itretyakov@griddynamics.com > --e89a8fb1f2aa920b2004d2f25a40 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Ivan,=A0
Here are a couple of more suggestions provided by the wiki:= =A0


Regards,=A0
Robert

On Thu, Jan 10, 2013 at 5:33 AM, Ivan Tretyakov &= lt;itretya= kov@griddynamics.com> wrote:
I also found following exception in datanode= , I suppose it might give some clue:

2013-01-10 11:= 37:55,397 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: node02.303= net.pvt:50010:DataXceiver error processing READ_BLOCK operation =A0src: /192.168.1.112:35991<= /a> dest: /192.168= .1.112:50010
java.net.SocketTimeoutException: 480000 millis timeout while waiting f= or channel to be ready for write. ch : java.nio.channels.SocketChannel[conn= ected local=3D/192= .168.1.112:50010 remote=3D/192.168.1.112:35991]
=A0 =A0 =A0 =A0 at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO= (SocketIOWithTimeout.java:247)
=A0 =A0 =A0 =A0 at org.apache.hado= op.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:166)
=A0 =A0 =A0 =A0 at org.apache.hadoop.net.SocketOutputStream.transferT= oFully(SocketOutputStream.java:214)
=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.datanode.BlockSender.= sendPacket(BlockSender.java:492)
=A0 =A0 =A0 =A0 at org.apache.ha= doop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:655)
=
=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.= readBlock(DataXceiver.java:280)
=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.protocol.datatransfer.Receiv= er.opReadBlock(Receiver.java:88)
=A0 =A0 =A0 =A0 at org.apache.ha= doop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:63)
<= div>=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.datanode.DataXceiver.r= un(DataXceiver.java:219)
=A0 =A0 =A0 =A0 at java.lang.Thread.run(Thread.java:662)


On Thu, Jan 10= , 2013 at 4:04 PM, Ivan Tretyakov <itretyakov@griddynamics.com> wrote:
Hello!

On our = cluster jobs fails with the following exception:

2013-01-10 10:34:05,648 WARN org.apache.hadoop.hdfs.DFSClient: Da= taStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/pers= ona/usersAggregate_20130110_15/_temporary/_attempt_201212271414_0458_m_0000= 01_1/s/375ee510bbf44815b151df556e06b5ca could only be replicated to 0 nodes= instead of minReplication (=3D1). =A0There are 6 datanode(s) running and n= o node(s) are excluded in this operation.
=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.blockmanagement.Block= Manager.chooseTarget(BlockManager.java:1322)
=A0 =A0 =A0 =A0 at o= rg.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNam= esystem.java:2170)
=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcS= erver.addBlock(NameNodeRpcServer.java:471)
=A0 =A0 =A0 =A0 at org= .apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB= .addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.protocol.proto.ClientNamenod= eProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeP= rotocolProtos.java:44080)
=A0 =A0 =A0 =A0 at org.apache.hadoop.ip= c.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:4= 53)
=A0 =A0 =A0 =A0 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:898)=
=A0 =A0 =A0 =A0 at org.apache.hadoop.ipc.Server$Handler$1.run(Se= rver.java:1693)
=A0 =A0 =A0 =A0 at org.apache.hadoop.ipc.Server$H= andler$1.run(Server.java:1689)
=A0 =A0 =A0 =A0 at java.security.AccessController.doPrivileged(Native = Method)
=A0 =A0 =A0 =A0 at javax.security.auth.Subject.doAs(Subje= ct.java:396)
=A0 =A0 =A0 =A0 at org.apache.hadoop.security.UserGr= oupInformation.doAs(UserGroupInformation.java:1332)
=A0 =A0 =A0 =A0 at org.apache.hadoop.ipc.Server$Handler.run(Server.jav= a:1687)

=A0 =A0 =A0 =A0 at org.apache.hadoop.ipc.C= lient.call(Client.java:1160)
=A0 =A0 =A0 =A0 at org.apache.hadoop= .ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
=A0 =A0 =A0 =A0 at $Proxy10.addBlock(Unknown Source)
=A0 =A0= =A0 =A0 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
=A0 =A0 =A0 =A0 at sun.reflect.NativeMethodAccessorImpl.invoke(Nativ= eMethodAccessorImpl.java:39)
=A0 =A0 =A0 =A0 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Del= egatingMethodAccessorImpl.java:25)
=A0 =A0 =A0 =A0 at java.lang.r= eflect.Method.invoke(Method.java:597)
=A0 =A0 =A0 =A0 at org.apac= he.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandl= er.java:164)
=A0 =A0 =A0 =A0 at org.apache.hadoop.io.retry.RetryInvocationHandler.i= nvoke(RetryInvocationHandler.java:83)
=A0 =A0 =A0 =A0 at $Proxy10= .addBlock(Unknown Source)
=A0 =A0 =A0 =A0 at org.apache.hadoop.hd= fs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodePro= tocolTranslatorPB.java:290)
=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer= .locateFollowingBlock(DFSOutputStream.java:1150)
=A0 =A0 =A0 =A0 = at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStrea= m(DFSOutputStream.java:1003)
=A0 =A0 =A0 =A0 at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer= .run(DFSOutputStream.java:463)

I'v= e found that it could be cause by lack of free disk space, but as I could s= ee there is everything well (see attached dfs report output).


Thanks in advance!
<= div>
--
Best Regards
Ivan Tretyakov



--
Best Regards
Ivan Tretyakov

Deployment=A0Engineer
Grid Dynamics
Skype: ivan.tretyakov

--e89a8fb1f2aa920b2004d2f25a40--