hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Riccardo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1014) map/reduce is corrupting data between map and reduce
Date Wed, 14 Feb 2007 04:00:10 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472972
] 

Riccardo commented on HADOOP-1014:
----------------------------------

Same problem here. I have a bunch of M/R jobs that process some 100Gb of text data and collect
different types of stats. Since Hadoop 0.10.0 I started noticing that some of the reduce outputs
were missing data (some of them were empty). I modified the TestMapRed() JUnit to run the
following test. I create two separate sequence files by generating N random integers and storing
them in one file as IntWritable() values and as Text() writables in the other one. Matching
integers have the same key. Then I ran the M/R job with an identity mapper (I use GenericWritable()
to wrap the different value types) and in the reducer I check if both the integer and the
string representation are received for a given key. The job runs just fine until up about
50M keys, then it starts failing for 100M keys or more. The assertion exception is caused
by the reducer receiving only one value not both. One problem is that the job does not fail
consistently and it always throws the assertion exceptions for different keys and within different
reduce tasks. Our M/R reduce cluster is composed of 10 high-end dual-core linux boxes with
about 8Tb of aggregate HD capacity. I cannot get this JUnit (or any of my jobs) to fail  with
Hadoop 0.9.2. They actually run on it like a breeze.

> map/reduce is corrupting data between map and reduce
> ----------------------------------------------------
>
>                 Key: HADOOP-1014
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1014
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.11.1
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>            Priority: Blocker
>             Fix For: 0.11.2
>
>
> It appears that a random data corruption is happening between the map and the reduce.
This looks to be a blocker until it is resolved. There were two relevant messages on hadoop-dev:
> from Mike Smith:
> The map/reduce jobs are not consistent in hadoop 0.11 release and trunk both
> when you rerun the same job. I have observed this inconsistency of the map
> output in different jobs. A simple test to double check is to use hadoop
> 0.11 with nutch trunk.
> from Albert Chern:
> I am having the same problem with my own map reduce jobs.  I have a job
> which requires two pieces of data per key, and just as a sanity check I make
> sure that it gets both in the reducer, but sometimes it doesn't.  What's
> even stranger is, the same tasks that complain about missing key/value pairs
> will maybe fail two or three times, but then succeed on a subsequent try,
> which leads me to believe that the bug has to do with randomization (I'm not
> sure, but I think the map outputs are shuffled?).
> All of my code works perfectly with 0.9, so I went back and just compared
> the sizes of the outputs.  For some jobs, the outputs from 0.11 were
> consistently 4 bytes larger, probably due to changes in SequenceFile.  But
> for others, the output sizes were all over the place.  Some partitions were
> empty, some were correct, and some were missing data.  There seems to be
> something seriously wrong with 0.11, so I suggest you use 0.9.  I've been
> trying to pinpoint the bug but its random nature is really annoying.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message