hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nt Never" <ntne...@gmail.com>
Subject Re: [jira] Updated: (HADOOP-1014) map/reduce is corrupting data between map and reduce
Date Thu, 15 Feb 2007 16:21:05 GMT
You are totally right, my bad. Your patched version passes all the JUnit
tests now. I will now test it on my largest jobs and compare with 0.9.2.
Should take about 7-8 hours. Thanks.

On 2/15/07, Devaraj Das (JIRA) <jira@apache.org> wrote:
>
>
>      [
> https://issues.apache.org/jira/browse/HADOOP-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Devaraj Das updated HADOOP-1014:
> --------------------------------
>
>     Attachment: zero-size-inmem-fs.patch
>                 TestMapRed.java
>
> Riccardo, the problem with your testcase was that in the "readFields"
> method of the WritableWrapper class you were not setting the "type" field so
> that "write" would always write the value '0' for 'type' in the map output
> file. Hence the reduces won't get the intended map outputs. I have attached
> the fixed TestMapRed.java.
> In your map method, you instantiate a WritableWrapper object which has the
> type field correctly set. After that you do output.collect which would
> have different behaviors in the versions 0.9 and 0.10+. In the version 0.9,
> the output.collect will write the data directly to the final map output
> file. However, with the 0.10+ versions, the output gets buffered and
> written later and there is deserialization/serialization happening when the
> output is finally written to disk from the buffer. In your code, the
> deserialization code (readFields) was not setting the type field and hence
> the serialization (write) would always write 0 (type - INT_WRITABLE) for the
> type field. The reducer, thus, would never see UTF8.
>
> Also attached a patch that would disable inmem merge (basically sets the
> buffer size for the ramfs to 0, and does some checks for that). This should
> remove the blocker.
>
> Mike, Albert and Riccardo - please comment whether this solves the issues
> you reported for the time being. I will continue to debug the inmem merge.
> Must be some race condition somewhere since the failures are not consistent.
> Thanks.
>
> > map/reduce is corrupting data between map and reduce
> > ----------------------------------------------------
> >
> >                 Key: HADOOP-1014
> >                 URL: https://issues.apache.org/jira/browse/HADOOP-1014
> >             Project: Hadoop
> >          Issue Type: Bug
> >          Components: mapred
> >    Affects Versions: 0.11.1
> >            Reporter: Owen O'Malley
> >         Assigned To: Devaraj Das
> >            Priority: Blocker
> >             Fix For: 0.11.2
> >
> >         Attachments: TestMapRed.java, TestMapRed.patch,
> TestMapRed2.patch, zero-size-inmem-fs.patch
> >
> >
> > It appears that a random data corruption is happening between the map
> and the reduce. This looks to be a blocker until it is resolved. There were
> two relevant messages on hadoop-dev:
> > from Mike Smith:
> > The map/reduce jobs are not consistent in hadoop 0.11 release and trunk
> both
> > when you rerun the same job. I have observed this inconsistency of the
> map
> > output in different jobs. A simple test to double check is to use hadoop
> > 0.11 with nutch trunk.
> > from Albert Chern:
> > I am having the same problem with my own map reduce jobs.  I have a job
> > which requires two pieces of data per key, and just as a sanity check I
> make
> > sure that it gets both in the reducer, but sometimes it doesn't.  What's
> > even stranger is, the same tasks that complain about missing key/value
> pairs
> > will maybe fail two or three times, but then succeed on a subsequent
> try,
> > which leads me to believe that the bug has to do with randomization (I'm
> not
> > sure, but I think the map outputs are shuffled?).
> > All of my code works perfectly with 0.9, so I went back and just
> compared
> > the sizes of the outputs.  For some jobs, the outputs from 0.11 were
> > consistently 4 bytes larger, probably due to changes in
> SequenceFile.  But
> > for others, the output sizes were all over the place.  Some partitions
> were
> > empty, some were correct, and some were missing data.  There seems to be
> > something seriously wrong with 0.11, so I suggest you use 0.9.  I've
> been
> > trying to pinpoint the bug but its random nature is really annoying.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message