hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files
Date Sat, 23 May 2009 10:35:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712400#action_12712400

Hadoop QA commented on HADOOP-5539:

-1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 777761.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/386/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/386/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/386/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/386/console

This message is automatically generated.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>         Attachments: 5539.patch, hadoop-5539-v1.patch, hadoop-5539.patch
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression
set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the
reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on
disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate
segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec:
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of
the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read
the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that
produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks
they where intended to maintain it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message