hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Twensky <jim.twen...@gmail.com>
Subject Re: getting DiskErrorException during map
Date Wed, 15 Apr 2009 21:37:54 GMT
Alex,

Yes, I bounced the Hadoop daemons after I changed the configuration files.

I also tried setting  $HADOOP_CONF_DIR to the directory where my
hadop-site.xml file resides but it didn't work.
However, I'm sure that HADOOP_CONF_DIR is not the issue because other
properties that I changed in hadoop-site.xml
seem to be properly set. Also, here is a section from my hadoop-site.xml
file:

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/scratch/local/jim/hadoop-${user.name}</value>
    </property>
    <property>
        <name>mapred.local.dir</name>
        <value>/scratch/local/jim/hadoop-${user.name}/mapred/local</value>
    </property>

I also created /scratch/local/jim/hadoop-jim/mapred/local on each task
tracker since I know
directories that do not exist are ignored.

When I manually ssh to the task trackers, I can see the directory
/scratch/local/jim/hadoop-jim/dfs
is automatically created so is it seems like  hadoop.tmp.dir is set
properly. However, hadoop still creates
/tmp/hadoop-jim/mapred/local and uses that directory for the local storage.

I'm starting to suspect that mapred.local.dir is overwritten to a default
value of /tmp/hadoop-${user.name}
somewhere inside the binaries.

-jim

On Tue, Apr 14, 2009 at 4:07 PM, Alex Loddengaard <alex@cloudera.com> wrote:

> First, did you bounce the Hadoop daemons after you changed the
> configuration
> files?  I think you'll have to do this.
>
> Second, I believe 0.19.1 has hadoop-default.xml baked into the jar.  Try
> setting $HADOOP_CONF_DIR to the directory where hadoop-site.xml lives.  For
> whatever reason your hadoop-site.xml (and the hadoop-default.xml you tried
> to change) are probably not being loaded.  $HADOOP_CONF_DIR should fix
> this.
>
> Good luck!
>
> Alex
>
> On Mon, Apr 13, 2009 at 11:25 AM, Jim Twensky <jim.twensky@gmail.com>
> wrote:
>
> > Thank you Alex, you are right. There are quotas on the systems that I'm
> > working. However, I tried to change mapred.local.dir as follows:
> >
> > --inside hadoop-site.xml:
> >
> >    <property>
> >        <name>mapred.child.tmp</name>
> >        <value>/scratch/local/jim</value>
> >    </property>
> >    <property>
> >        <name>hadoop.tmp.dir</name>
> >        <value>/scratch/local/jim</value>
> >    </property>
> >    <property>
> >        <name>mapred.local.dir</name>
> >        <value>/scratch/local/jim</value>
> >    </property>
> >
> >  and observed that the intermediate map outputs are still being written
> > under /tmp/hadoop-jim/mapred/local
> >
> > I'm confused at this point since I also tried setting these values
> directly
> > inside the hadoop-default.xml and that didn't work either. Is there any
> > other property that I'm supposed to change? I tried searching for "/tmp"
> in
> > the hadoop-default.xml file but couldn't find anything else.
> >
> > Thanks,
> > Jim
> >
> >
> > On Tue, Apr 7, 2009 at 9:35 PM, Alex Loddengaard <alex@cloudera.com>
> > wrote:
> >
> > > The getLocalPathForWrite function that throws this Exception assumes
> that
> > > you have space on the disks that mapred.local.dir is configured on.
>  Can
> > > you
> > > verify with `df` that those disks have space available?  You might also
> > try
> > > moving mapred.local.dir off of /tmp if it's configured to use /tmp
> right
> > > now; I believe some systems have quotas on /tmp.
> > >
> > > Hope this helps.
> > >
> > > Alex
> > >
> > > On Tue, Apr 7, 2009 at 7:22 PM, Jim Twensky <jim.twensky@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm using Hadoop 0.19.1 and I have a very small test cluster with 9
> > > nodes,
> > > > 8
> > > > of them being task trackers. I'm getting the following error and my
> > jobs
> > > > keep failing when map processes start hitting 30%:
> > > >
> > > > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> > any
> > > > valid local directory for
> > > >
> > > >
> > >
> >
> taskTracker/jobcache/job_200904072051_0001/attempt_200904072051_0001_m_000000_1/output/file.out
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:335)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapOutputFile.getOutputFileForWrite(MapOutputFile.java:61)
> > > >        at
> > > >
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1209)
> > > >        at
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:867)
> > > >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> > > >        at org.apache.hadoop.mapred.Child.main(Child.java:158)
> > > >
> > > >
> > > > I googled many blogs and web pages but I could neither understand why
> > > this
> > > > happens nor found a solution to this. What does that error message
> mean
> > > and
> > > > how can avoid it, any suggestions?
> > > >
> > > > Thanks in advance,
> > > > -jim
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message