hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Hadoop streaming - No room for reduce task error
Date Thu, 11 Jun 2009 00:31:55 GMT
Hey Scott,
It turns out that Alex's answer was mistaken - your error is actually coming
from lack of disk space on the TT that has been assigned the reduce task.
Specifically, there is not enough space in mapred.local.dir. You'll need to
change your mapred.local.dir to point to a partition that has enough space
to contain your reduce output.

As for why this is the case, I hope someone will pipe up. It seems to me
that reduce output can go directly to the target filesystem without using
space on mapred.local.dir.

Thanks
-Todd

On Wed, Jun 10, 2009 at 4:58 PM, Alex Loddengaard <alex@cloudera.com> wrote:

> What is mapred.child.ulimit set to?  This configuration options specifics
> how much memory child processes are allowed to have.  You may want to up
> this limit and see what happens.
>
> Let me know if that doesn't get you anywhere.
>
> Alex
>
> On Wed, Jun 10, 2009 at 9:40 AM, Scott <skester@weather.com> wrote:
>
> > Complete newby map/reduce question here.  I am using hadoop streaming as
> I
> > come from a Perl background, and am trying to prototype/test a process to
> > load/clean-up ad server log lines from multiple input files into one
> large
> > file on the hdfs that can then be used as the source of a hive db table.
> > I have a perl map script that reads an input line from stdin, does the
> > needed cleanup/manipulation, and writes back to stdout.    I don't really
> > need a reduce step, as I don't care what order the lines are written in,
> and
> > there is no summary data to produce.  When I run the job with -reducer
> NONE
> > I get valid output, however I get multiple part-xxxxx files rather than
> one
> > big file.
> > So I wrote a trivial 'reduce' script that reads from stdin and simply
> > splits the key/value, and writes the value back to stdout.
> >
> > I am executing the code as follows:
> >
> > ./hadoop jar ../contrib/streaming/hadoop-0.19.1-streaming.jar -mapper
> > "/usr/bin/perl /home/hadoop/scripts/map_parse_log_r2.pl" -reducer
> > "/usr/bin/perl /home/hadoop/scripts/reduce_parse_log.pl" -input
> /logs/*.log
> > -output test9
> >
> > The code I have works when given a small set of input files.  However, I
> > get the following error when attempting to run the code on a large set of
> > input files:
> >
> > hadoop-hadoop-jobtracker-testdw0b00.log.2009-06-09:2009-06-09
> 15:43:00,905
> > WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task.
> Node
> > tracker_testdw0b00:localhost.localdomain/127.0.0.1:53245 has 2004049920
> > bytes free; but we expect reduce input to take 22138478392
> >
> > I assume this is because the all the map output is being buffered in
> memory
> > prior to running the reduce step?  If so, what can I change to stop the
> > buffering?  I just need the map output to go directly to one large file.
> >
> > Thanks,
> > Scott
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message