hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13114) DistCp should have option to compress data on write
Date Fri, 18 Nov 2016 00:49:58 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ravi Prakash updated HADOOP-13114:
----------------------------------
    Attachment: HADOOP-13114.05.patch

Hi Suraj!

Thanks a lot for all your efforts to improve DistCp. My sincere apologies for not paying attention
to this issue. I'm sorry I was a bit busy when you had asked and then never got back to this
issue. Yongjun seems to want this in, so we'll make another push for it. 
Here's a rebase for the latest trunk. I'll try to review and test it in the coming days.

> DistCp should have option to compress data on write
> ---------------------------------------------------
>
>                 Key: HADOOP-13114
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13114
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>    Affects Versions: 2.8.0, 2.7.3, 3.0.0-alpha1
>            Reporter: Suraj Nayak
>            Assignee: Suraj Nayak
>            Priority: Minor
>              Labels: distcp
>         Attachments: HADOOP-13114-trunk_2016-05-07-1.patch, HADOOP-13114-trunk_2016-05-08-1.patch,
HADOOP-13114-trunk_2016-05-10-1.patch, HADOOP-13114-trunk_2016-05-12-1.patch, HADOOP-13114.05.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> DistCp utility should have capability to store data in user specified compression format.
This avoids one hop of compressing data after transfer. Backup strategies to different cluster
also get benefit of saving one IO operation to and from HDFS, thus saving resources, time
and effort.
> * Create an option -compressOutput defaulting to {{org.apache.hadoop.io.compress.BZip2Codec}}.

> * Users will be able to change codec with {{-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}}
> * If distcp compression is enabled, suffix the filenames with default codec extension
to indicate the file is compressed. Thus users can be aware of what codec was used to compress
the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message