hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mahadev konar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2212) MapTask and ReduceTask should only compress/decompress the final map output file
Date Tue, 07 Dec 2010 22:33:02 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969049#action_12969049
] 

Mahadev konar commented on MAPREDUCE-2212:
------------------------------------------

joydeep,
 shouldnt that be the default to compress the on disk version if JobConf.setCompressMapOutput()
is set to true. The jobs that have been running with this property set should have the same
disk/network footprint that they used to have. no? 

> MapTask and ReduceTask should only compress/decompress the final map output file
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2212
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2212
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 0.23.0
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>             Fix For: 0.23.0
>
>
> Currently if we set mapred.map.output.compression.codec
> 1. MapTask will compress every spill, decompress every spill, merge and compress the
final map output file
> 2. ReduceTask will decompress, merge and compress every map output file. And repeat the
compression/decompression every pass.
> This causes all the data being compressed/decompressed many times.
> The reason we need mapred.map.output.compression.codec is for network traffic.
> We should not compress/decompress the data again and again during merge sort.
> We should only compress the final map output file that will be transmitted over the network.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message