Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 132B6D1DF for ; Thu, 25 Oct 2012 15:05:07 +0000 (UTC) Received: (qmail 67400 invoked by uid 500); 25 Oct 2012 15:05:06 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 67300 invoked by uid 500); 25 Oct 2012 15:05:06 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 67271 invoked by uid 99); 25 Oct 2012 15:05:05 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 15:05:05 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuzhihong@gmail.com designates 209.85.212.182 as permitted sender) Received: from [209.85.212.182] (HELO mail-wi0-f182.google.com) (209.85.212.182) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 15:04:59 +0000 Received: by mail-wi0-f182.google.com with SMTP id hm2so1426099wib.11 for ; Thu, 25 Oct 2012 08:04:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=A9x3DZkYSZhbDgHa5qexpvhxzdpSPVS/3pN9rrycKbY=; b=LZWQWSxZsbtpOOEClpCCkH8oT3EeGDR0Ziycy5dz6nfyXBq1i2ohzX6k7woXJfXdXk nw9YqAmGNQWJLTm2I/T3mmf45HpC261wc7KCCPofKrz0vigTstXAW15k6V5SwZi2sWU+ w8DMUXcKlyz01zcp9DEykd8lME0ORTf1DWbc5hdlY4216oJGvtGiC1EsOvCVAL7gN5Bl 7dXYSpQvfHngot5Ezau8Y+cMSKNgs0XTE3IWR2BY+KfYC5mIRzQNteD1IPAPBmtnKsSC +MRmgFW37plT1p+Ki9anHWP31Y5m+sH/aUjXuzu9Gqu7CxZIvGvhunqILlrU3MN8H4Wv /cJQ== MIME-Version: 1.0 Received: by 10.216.193.65 with SMTP id j43mr11132718wen.141.1351177478990; Thu, 25 Oct 2012 08:04:38 -0700 (PDT) Received: by 10.216.209.152 with HTTP; Thu, 25 Oct 2012 08:04:38 -0700 (PDT) In-Reply-To: References: Date: Thu, 25 Oct 2012 08:04:38 -0700 Message-ID: Subject: Re: division by zero in getLocalPathForWrite() From: Ted Yu To: mapreduce-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=0016e6d99ce2a9eb5404cce3854e X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d99ce2a9eb5404cce3854e Content-Type: text/plain; charset=ISO-8859-1 I will try 2.0.2-alpha release. Cheers On Thu, Oct 25, 2012 at 7:54 AM, Ted Yu wrote: > Thanks for the quick response, Robert. > Here is the hadoop version being used: > 2.0.1-alpha > > If there is newer release, I am willing to try that before filing JIRA. > > > On Thu, Oct 25, 2012 at 7:07 AM, Robert Evans wrote: > >> It looks like you are running with an older version of 2.0, even though it >> does not really make much of a difference in this case, The issue shows >> up when getLocalPathForWrite thinks there is no space on to write to on >> any of the disks it has configured. This could be because you do not have >> any directories configured. I really don't know for sure exactly what is >> happening. It might be disk fail in place removing disks for you because >> of other issues. Either way we should file a JIRA against Hadoop to make >> it so we never get the / by zero error and provide a better way to handle >> the possible causes. >> >> --Bobby Evans >> >> On 10/24/12 11:54 PM, "Ted Yu" wrote: >> >> >Hi, >> >HBase has Jenkins build against hadoop 2.0 >> >I was checking why TestRowCounter sometimes failed: >> > >> https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/231/testReport/o >> >> >rg.apache.hadoop.hbase.mapreduce/TestRowCounter/testRowCounterExclusiveCol >> >umn/ >> > >> >I think the following could be the cause: >> > >> >2012-10-22 23:46:32,571 WARN [AsyncDispatcher event handler] >> >resourcemanager.RMAuditLogger(255): USER=jenkins >> OPERATION=Application >> >Finished - Failed TARGET=RMAppManager RESULT=FAILURE >> DESCRIPTION=App >> >failed with state: FAILED PERMISSIONS=Application >> >application_1350949562159_0002 failed 1 times due to AM Container for >> >appattempt_1350949562159_0002_000001 exited with exitCode: -1000 due >> >to: java.lang.ArithmeticException: / by zero >> > at >> >> >org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathFor >> >Write(LocalDirAllocator.java:355) >> > at >> >> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >> >tor.java:150) >> > at >> >> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >> >tor.java:131) >> > at >> >> >org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAlloca >> >tor.java:115) >> > at >> >> >org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService.getLocal >> >PathForWrite(LocalDirsHandlerService.java:257) >> > at >> >> >org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.Resou >> >> >rceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.jav >> >a:849) >> > >> >However, I don't seem to find where in getLocalPathForWrite() division by >> >zero could have arisen. >> > >> >Comment / hint is welcome. >> > >> >Thanks >> >> > --0016e6d99ce2a9eb5404cce3854e--