hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Reducer goes past 100% complete?
Date Mon, 09 Mar 2009 19:21:29 GMT
speculative execution.

On Mon, Mar 9, 2009 at 12:19 PM, Nathan Marz <nathan@rapleaf.com> wrote:

> I have the same problem with reducers going past 100% on some jobs. I've
> seen reducers go as high as 120%. Would love to know what the issue is.
> On Mar 9, 2009, at 8:45 AM, Doug Cook wrote:
>> Hi folks,
>> I've recently upgraded to Hadoop 0.19.1 from a much, much older version of
>> Hadoop.
>> Most things in my application (a highly modified version of Nutch) are
>> working just fine, but one of them is bombing out with odd symptoms. The
>> map
>> works just fine, but then reduce phase (a) runs extremely slowly and (b)
>> the
>> "percentage complete" reporting for each reduce task doesn't stop at 100%,
>> it just keeps going on past that.
>> I figure I'll start by understanding the percentage-complete reporting
>> issue, since it's pretty concrete and may have some bearing on the
>> performance issue. It seems likely that my application is mis-configuring
>> the job, or otherwise not correctly using the Hadoop API. I don't think
>> I'm
>> doing anything way out of the ordinary; my reducer simply creates an
>> object,
>> wraps it in an ObjectWritable, and calls output.collect(), and I have a
>> local class that implements OutputFormat to take the object and put it in
>> a
>> Lucene index. It does actually create correct output, at least for small
>> indices; on large indices, the performance problems are killing me.
>> I can and will start rummaging around in the Hadoop code to figure out how
>> it calculates percentage complete, and see what I'm not doing correctly,
>> but
>> thought I'd ask here, too, to see if someone has good suggestions off the
>> top of their head.
>> Many thanks-
>> Doug Cook
>> --
>> View this message in context:
>> http://www.nabble.com/Reducer-goes-past-100--complete--tp22413589p22413589.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.

Alpha Chapters of my book on Hadoop are available

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message