Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <206810738.1244750410009.JavaMail.jira@brutus>
Date: Thu, 11 Jun 2009 13:00:10 -0700 (PDT)
From: "Hudson (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-5539) o.a.h.mapred.Merger not maintaining
 map out compression on intermediate files
In-Reply-To: <1362153722.1237531970668.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718616#action_12718616 ] 

Hudson commented on HADOOP-5539:
--------------------------------

Integrated in Hadoop-trunk #863 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/863/])
    

> o.a.h.mapred.Merger not maintaining map out compression on intermediate files
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5539
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5539
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.1
>         Environment: 0.19.2-dev, r753365 
>            Reporter: Billy Pearson
>            Assignee: Jothi Padmanabhan
>            Priority: Blocker
>             Fix For: 0.20.1
>
>         Attachments: 5539.patch, hadoop-5539-branch20.patch, hadoop-5539-v1.patch, hadoop-5539.patch
>
>
> hadoop-site.xml :
> mapred.compress.map.output = true
> map output files are compressed but when the in memory merger closes 
> on the reduce the on disk merger runs to reduce input files to <= io.sort.factor if needed. 
> when this happens it outputs files called intermediate.x files these 
> do not maintain compression setting the writer (o.a.h.mapred.Merger.class line 432)
> passes the codec but I added some logging and its always null map output compression set true or false.
> This causes task to fail if they can not hold the uncompressed size of the data of the reduce its holding
> I thank this is just and oversight of the codec not getting set correctly for the on disk merges.
> {code}
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: Merging 30 intermediate segments out of a total of 3000
> 2009-03-20 01:30:30,005 INFO org.apache.hadoop.mapred.Merger: intermediate.1 used codec: null
> {code}
> I added 
> {code}
>           // added my me
> 	   if (codec != null){
> 	     LOG.info("intermediate." + passNo + " used codec: " + codec.toString());
> 	   } else {
> 	     LOG.info("intermediate." + passNo + " used codec: Null");
> 	   }
> 	   // end added by me
> {code}
> Just before the creation of the writer o.a.h.mapred.Merger.class line 432
> and it outputs the second line above.
> I have confirmed this with the logging and I have looked at the files on the disk of the tasktracker. I can read the data in 
> the intermediate files clearly telling me that there not compressed but I can not read the map.out files direct from the map output
> telling me the compression is working on the map end but not on the on disk merge that produces the intermediate.
> I can see no benefit for these not maintaining the compression setting and as it looks they where intended to maintain it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.