hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: conf.setNumReduceTasks(1) but the code called 3 times
Date Thu, 30 Jul 2009 14:54:04 GMT
A rule of thumb is to not enable speculative execution if the tasks have
side effects that are not cleaned up on task abort.
The tasktracker will clean up the task output directory on task abort.
Writing your zip files into the task output directory will allow the
framework to remove zip file created by tasks that are killed by the
framework.
FileOutputFormat.setWorkOutputPath will give you the task output directory.

On Wed, Jul 29, 2009 at 10:11 AM, Mark Kerzner <markkerzner@gmail.com>wrote:

> I think that was it or close: it now goes through my Reducer code only
> twice
> instead of multiple times. I would like it to do it just once, but I can
> perhaps live with that - after all, writing zip files by myself, outside of
> hadoop paradigm may be not quite standard.
> The second concern is - how to control this when executing on Amazon Map
> Reduce? I could not find a way.
>
> Thanks!
>
> Mark
>
> On Wed, Jul 29, 2009 at 9:41 AM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
>
> > On Wed, Jul 29, 2009 at 12:58 AM, Mark Kerzner<markkerzner@gmail.com>
> > wrote:
> > > Hi,
> > > I set the number of reducers to 1, and I indeed get only one output
> > > file, /output/part-00000.
> > >
> > > However, in configure() and in close() I do a System.out, and I see
> that
> > > these are called three times, not one.
> > >
> > > Why does it matter to me? In configure I open a zip file, into which I
> > write
> > > the binary parts of my maps, and in close() I close it. I would expect
> > this
> > > to be called just once, producing one zip file, but instead it is
> called
> > > three (and 2 when running from IDE), so it produces 3 zip files. I have
> > to
> > > play games so that the names of the zip files don't collide - and I am
> > not
> > > sure if this is stable.
> > >
> > > What am I missing in my understanding?
> > >
> > > Thank you,
> > > Mark
> > >
> >
> > You should take a look at all the %speculative%execution properties
> >   <property>
> >        <name>mapred.reduce.tasks.speculative.execution</name>
> >        <value>false</value>
> >    </property>
> >
> > The cause multiple copies of the same map/reduce to be executed to try
> > to deal with slow mappers. In applications like web/ftp fetching or
> > file/database writing you probably want these off.
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message