hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining map out compression on intermediate files
Date Tue, 19 May 2009 20:51:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710881#action_12710881
] 

Hadoop QA commented on HADOOP-5539:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12408232/hadoop-5539.patch
  against trunk revision 776352.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit
warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/358/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/358/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/358/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/358/console

This message is automatically generated.

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Priority: Blocker
>             Fix For: 0.19.2
>
>         Attachments: 5539.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if
needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression
set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the
reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on
disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate
segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec:
null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of
the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read
the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that
produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks
they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message