hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Shih <ryan.s...@gmail.com>
Subject Re: Potential race condition (Hadoop 18.3)
Date Mon, 02 Mar 2009 22:02:04 GMT
Koji - That makes a lot of sense. The two tasks are probably stepping over
each other. I'll give it a try and let you know how it goes.

Malcolm - if you turned off speculative execution and are still getting the
problem, it doesn't sound the same. Do you want to do a cut&paste of your
reduce code and I'll see if I can spot anything suspicious?

On Mon, Mar 2, 2009 at 1:15 PM, Malcolm Matalka <
mmatalka@millennialmedia.com> wrote:

> I have a situation which may be related.  I am running hadoop 0.18.1.  I
> am on a cluster with 5 machines and testing on very small input of 10
> lines.  Mapper produces either 1 or 0 output per line of input yet
> somehow I get 18 lines of output from the reducer.  For example I have
> one input where the key is:
> fd349fc441ff5e726577aeb94cceb1e4
>
> However, I added a print to the reducer to print keys right before
> calling output.collect and I have 3 instances of this key being printed.
>
> I have turned speculative execution off and still get this.
>
> Does this sound related?  A known bug?  Something I'm missing?  Fixed in
> 19.1?
>
> - Malcolm
>
>
> -----Original Message-----
> From: Koji Noguchi [mailto:knoguchi@yahoo-inc.com]
> Sent: Monday, March 02, 2009 15:59
> To: core-user@hadoop.apache.org
> Subject: RE: Potential race condition (Hadoop 18.3)
>
> Ryan,
>
> If you're using getOutputPath, try replacing it with getWorkOutputPath.
>
> http://hadoop.apache.org/core/docs/r0.18.3/api/org/apache/hadoop/mapred/
> FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf<http://hadoop.apache.org/core/docs/r0.18.3/api/org/apache/hadoop/mapred/%0AFileOutputFormat.html#getWorkOutputPath%28org.apache.hadoop.mapred.JobConf>
> )
>
> Koji
>
> -----Original Message-----
> From: Ryan Shih [mailto:ryan.shih@gmail.com]
> Sent: Monday, March 02, 2009 11:01 AM
> To: core-user@hadoop.apache.org
> Subject: Potential race condition (Hadoop 18.3)
>
> Hi - I'm not sure yet, but I think I might be hitting a race condition
> in
> Hadoop 18.3. What seems to happen is that in the reduce phase, some of
> my
> tasks perform speculative execution but when the initial task completes
> successfully, it sends a kill to the new task started. After all is said
> and
> done, perhaps one in every five or ten which kill their second task ends
> up
> with zero or truncated output.  When I code it to turn off speculative
> execution, the problem goes away. Are there known race conditions that I
> should be aware of around this area?
>
> Thanks in advance,
> Ryan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message