Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 54250 invoked from network); 10 May 2010 18:24:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 May 2010 18:24:03 -0000 Received: (qmail 33838 invoked by uid 500); 10 May 2010 18:24:01 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 33790 invoked by uid 500); 10 May 2010 18:24:01 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 33782 invoked by uid 500); 10 May 2010 18:24:01 -0000 Delivered-To: apmail-hadoop-core-user@hadoop.apache.org Received: (qmail 33779 invoked by uid 99); 10 May 2010 18:24:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 May 2010 18:24:01 +0000 X-ASF-Spam-Status: No, hits=3.6 required=10.0 tests=FREEMAIL_FROM,FS_REPLICA,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of oscar.gothberg@gmail.com designates 209.85.210.184 as permitted sender) Received: from [209.85.210.184] (HELO mail-yx0-f184.google.com) (209.85.210.184) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 May 2010 18:23:53 +0000 Received: by yxe14 with SMTP id 14so710013yxe.5 for ; Mon, 10 May 2010 11:23:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:date:message-id :subject:from:to:content-type; bh=S8ryF/XLYMUhDmCfe6TvWjmth+vExPpNdfe2Bmj333Y=; b=w8x3z3RYgMPjSHd1TORMgwYHsTkLlPd/s/Ncp1DCsADZRSKlqu+y1GfCR5Gb6yhLl7 jCpdA6vnjtqEGT5YMTuNcEAG7k/uc1aBfU9sutC5v0NYiFJMCnHEFurYa/6p08FgqmhP +J+PQ2RjoNYhhubKZngzdAkbd+3dP3b3ctWXk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=VqjMPqe7FElE0DaGkzWaejgCF2ovVwK67OpuAyfVjT/m9n9rUtv+dREUb1mB90EdGh jYyqI+OnI0ZHee8KlQeuJ8mHfX90dnIMtgk3nQDcJ3Om6Vn9bww3UEjlS5/L8RuHHMnk An9A2PX0KGgflo+dyGRYulAOCX6tW4bigdCR0= MIME-Version: 1.0 Received: by 10.229.242.74 with SMTP id lh10mr3792192qcb.61.1273515812583; Mon, 10 May 2010 11:23:32 -0700 (PDT) Received: by 10.229.230.70 with HTTP; Mon, 10 May 2010 11:23:32 -0700 (PDT) Date: Mon, 10 May 2010 11:23:32 -0700 Message-ID: Subject: job executions fail with NotReplicatedYetException From: Oscar Gothberg To: core-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi, I keep having jobs fail at the very end, with 100% complete "map", 100% complete "reduce", due to NotReplicatedYetException w.r.t the _temporary subdirectory of the job output directory. It doesn't happen 100% of the time, so it's not trivially reproducible, but it happens enough (10-20% of runs) to make it a real pain. Any ideas, has anyone seen something similar? Part of the stack trace: NotReplicatedYetException: Not replicated yet:/test/out/dayperiod=14731/_temporary/_attempt_201005052338_0194_r_000001_0/part-00001 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1253) at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422) at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959) ... Thanks, / Oscar