Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 76112 invoked from network); 15 Apr 2009 21:38:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Apr 2009 21:38:24 -0000 Received: (qmail 72752 invoked by uid 500); 15 Apr 2009 21:38:21 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 72684 invoked by uid 500); 15 Apr 2009 21:38:21 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 72674 invoked by uid 99); 15 Apr 2009 21:38:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Apr 2009 21:38:21 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jim.twensky@gmail.com designates 74.125.44.29 as permitted sender) Received: from [74.125.44.29] (HELO yx-out-2324.google.com) (74.125.44.29) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Apr 2009 21:38:15 +0000 Received: by yx-out-2324.google.com with SMTP id 3so81966yxj.29 for ; Wed, 15 Apr 2009 14:37:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=Vi6CtGlfAkEDq3oN9CZDfhE7tIY4rymoAmrEAiGYYI8=; b=ak+ts6zClau3zXoKCRPl67I8JweMaysUMS0YHKVTkND9F9jD+edotwhQUAqIXt3DEk XBJQAdWoaSncDgFeEdlbOVCDoND9nHyp/d8O+bnurCV1gfiBaNBLexMz6vCddlSh/EI5 mMJlhyiTOxx0QW6Ag5RhOiSMcEkAFgaNmQ2GY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=JFFUpQ+TUWvCfc3IOrC23GY4cDYsfu0IPf9pVGlatXsqo2bdHVP9zYLsoqXQNYUaWh XwLxaeso5ZuWGeYYDZf+nMgyfU4eTQy4el6qIZMiwG7LS7F2Yrn/1+znT6sCgTypb1i+ 1s0YSwnVkrpBG9H77nHcQ+6bGgNG7mlTNm8u8= MIME-Version: 1.0 Received: by 10.151.156.7 with SMTP id i7mr1029101ybo.29.1239831474753; Wed, 15 Apr 2009 14:37:54 -0700 (PDT) In-Reply-To: <623d9cf40904141407n52fe80dcv6e110288bc6db75@mail.gmail.com> References: <7a8854060904071922hc52c449ufa6910b3a9483c56@mail.gmail.com> <623d9cf40904071935t4cec0ee3gc000f0f11606db3e@mail.gmail.com> <7a8854060904131125m63fd1364jc23362d62e55a54b@mail.gmail.com> <623d9cf40904141407n52fe80dcv6e110288bc6db75@mail.gmail.com> Date: Wed, 15 Apr 2009 16:37:54 -0500 Message-ID: <7a8854060904151437r5806255cm6eeb3e4f5d00aa54@mail.gmail.com> Subject: Re: getting DiskErrorException during map From: Jim Twensky To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00151750dd40a252bc04679ec224 X-Virus-Checked: Checked by ClamAV on apache.org --00151750dd40a252bc04679ec224 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Alex, Yes, I bounced the Hadoop daemons after I changed the configuration files. I also tried setting $HADOOP_CONF_DIR to the directory where my hadop-site.xml file resides but it didn't work. However, I'm sure that HADOOP_CONF_DIR is not the issue because other properties that I changed in hadoop-site.xml seem to be properly set. Also, here is a section from my hadoop-site.xml file: hadoop.tmp.dir /scratch/local/jim/hadoop-${user.name} mapred.local.dir /scratch/local/jim/hadoop-${user.name}/mapred/local I also created /scratch/local/jim/hadoop-jim/mapred/local on each task tracker since I know directories that do not exist are ignored. When I manually ssh to the task trackers, I can see the directory /scratch/local/jim/hadoop-jim/dfs is automatically created so is it seems like hadoop.tmp.dir is set properly. However, hadoop still creates /tmp/hadoop-jim/mapred/local and uses that directory for the local storage. I'm starting to suspect that mapred.local.dir is overwritten to a default value of /tmp/hadoop-${user.name} somewhere inside the binaries. -jim On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard wrote: > First, did you bounce the Hadoop daemons after you changed the > configuration > files? I think you'll have to do this. > > Second, I believe 0.19.1 has hadoop-default.xml baked into the jar. Try > setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives. For > whatever reason your hadoop-site.xml (and the hadoop-default.xml you tried > to change) are probably not being loaded. $HADOOP_CONF_DIR should fix > this. > > Good luck! > > Alex > > On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky > wrote: > > > Thank you Alex, you are right. There are quotas on the systems that I'm > > working. However, I tried to change mapred.local.dir as follows: > > > > --inside hadoop-site.xml: > > > > > > mapred.child.tmp > > /scratch/local/jim > > > > > > hadoop.tmp.dir > > /scratch/local/jim > > > > > > mapred.local.dir > > /scratch/local/jim > > > > > > and observed that the intermediate map outputs are still being written > > under /tmp/hadoop-jim/mapred/local > > > > I'm confused at this point since I also tried setting these values > directly > > inside the hadoop-default.xml and that didn't work either. Is there any > > other property that I'm supposed to change? I tried searching for "/tmp" > in > > the hadoop-default.xml file but couldn't find anything else. > > > > Thanks, > > Jim > > > > > > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard > > wrote: > > > > > The getLocalPathForWrite function that throws this Exception assumes > that > > > you have space on the disks that mapred.local.dir is configured on. > Can > > > you > > > verify with `df` that those disks have space available? You might also > > try > > > moving mapred.local.dir off of /tmp if it's configured to use /tmp > right > > > now; I believe some systems have quotas on /tmp. > > > > > > Hope this helps. > > > > > > Alex > > > > > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky > > wrote: > > > > > > > Hi, > > > > > > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9 > > > nodes, > > > > 8 > > > > of them being task trackers. I'm getting the following error and my > > jobs > > > > keep failing when map processes start hitting 30%: > > > > > > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find > > any > > > > valid local directory for > > > > > > > > > > > > > > taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out > > > > at > > > > > > > > > > > > > > org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61) > > > > at > > > > > > > > > > > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209) > > > > at > > > > > > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867) > > > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > > > > at org.apache.hadoop.mapred.Child.main(Child.java:158) > > > > > > > > > > > > I googled many blogs and web pages but I could neither understand why > > > this > > > > happens nor found a solution to this. What does that error message > mean > > > and > > > > how can avoid it, any suggestions? > > > > > > > > Thanks in advance, > > > > -jim > > > > > > > > > > --00151750dd40a252bc04679ec224--