hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: large reducer output with same key
Date Sat, 02 Jan 2010 18:37:07 GMT
I havve only seen that type of error when the tasktracker machine is very
heavily loaded and the task does not exit in a timely manner after the
tasktracker terminates it.

Is this error in your task log or in the tasktracker log?

On Fri, Jan 1, 2010 at 3:02 PM, himanshu chandola <
himanshu_coolguy@yahoo.com> wrote:

> Thanks.
>
> This is probably something trivial but if you would've any idea what could
> be causing this, it would be helpful. I replaced the mapred.local.dir to
> drives which have bigger capacity. The map jobs start to fail with the
> following message:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTracker/jobcache/job_200912311931_0002/attempt_200912311931_0002_m_000027_0/output/file.out.index
> in any of the configured local directories
>
>
> This is weird because the file in question exists on that machine in that
> directory (taskTracker/jobcache....). The permissions are also right so I
> haven't been able to realize what could be the problem.
>
> Do you have any ideas on this ?
>
> Thanks
>
>
>  Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
>
>
>
> ----- Original Message ----
> From: Jason Venner <jason.hadoop@gmail.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, December 31, 2009 1:46:47 PM
> Subject: Re: large reducer output with same key
>
> the mapred.local.dir paramter will be used by each tasktracker node to
> fprovide directory(ies) to store transitory data about the tasks the
> tasktracker runs.
> This includes the map output, and can be very large.
>
> On Thu, Dec 31, 2009 at 10:03 AM, himanshu chandola <
> himanshu_coolguy@yahoo.com> wrote:
>
> > Hi Todd,
> > Are these directories supposed to be on the namenode or on each of the
> > datanodes ? In my case it is  set to a directory inside /tmp but the
> > mapred.local.dir was present only on the namenode.
> >
> > Thanks for the help
> >
> > Himanshu
> >
> > Morpheus: Do you believe in fate, Neo?
> > Neo: No.
> > Morpheus: Why Not?
> > Neo: Because I don't like the idea that I'm not in control of my life.
> >
> >
> >
> > ----- Original Message ----
> > From: Todd Lipcon <todd@cloudera.com>
> > To: common-user@hadoop.apache.org
> > Sent: Thu, December 31, 2009 10:17:05 AM
> > Subject: Re: large reducer output with same key
> >
> > Hi Himanshu,
> >
> > Sounds like your mapred.local.dir doesn't have enough space. My guess is
> > that you've configured it somewhere inside /tmp/. Instead you should
> spread
> > it across all of your local physical disks by comma-separating the
> > directories in the configuration. Something like:
> >
> > <property>
> >  <name>mapred.local.dir</name>
> >
>  <value>/disk1/mapred-local,/disk2/mapred-local,/disk3/mapred-local</value>
> > </property>
> >
> > (and of course make sure those directories exist and are writable by the
> > user that runs your hadoop daemons, often "hadoop")
> >
> > Thanks
> > -Todd
> >
> > On Thu, Dec 31, 2009 at 2:10 AM, himanshu chandola <
> > himanshu_coolguy@yahoo.com> wrote:
> >
> > > Hi Everyone,
> > > My reducer output results in most of the data having the same key. The
> > > reducer output is close to 16 GB and though my cluster in total has a
> > > terabyte of space in hdfs I get errors like the following :
> > >
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
> > > >         at
> > > >
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> > > > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:
> > > > Could not find any valid local directory for
> > > > task_200808021906_0002_m_000014_2/spill4.out
> > >
> > > After such failures, hadoop tries to start the same reduce job couple
> > times
> > > on other nodes before the job fails. From the
> > > exception, it looks to me this is
> > > probably a disk error(some machines have less than 16 gigs free space
> on
> > > hdfs).
> > >
> > > So my question was whether hadoop puts values which share the same key
> as
> > a
> > > single block in one node ? Or something else
> > > could be happening here ?
> > >
> > > Thanks
> > >
> > > H
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>
>
>
>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message