hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suraj Nayak (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-13114) DistCp should have option to compress data on write
Date Sat, 07 May 2016 09:14:13 GMT
Suraj Nayak created HADOOP-13114:

             Summary: DistCp should have option to compress data on write
                 Key: HADOOP-13114
                 URL: https://issues.apache.org/jira/browse/HADOOP-13114
             Project: Hadoop Common
          Issue Type: Improvement
            Reporter: Suraj Nayak
            Assignee: Suraj Nayak
            Priority: Minor
             Fix For: 3.0.0

DistCp utility should have capability to store data in user specified compressed format. This
avoids one hop of compressing data after transfer. Backup strategies to different cluster
gets benefit saving one IO operation, time and effort.

* Create a option -compressOutput with defaulting to {{org.apache.avro.file.BZip2Codec}}.

* Users will be able to change codec with {{-D mapreduce.output.fileoutputformat.compress.codec=org.apache.avro.file.SnappyCodec}}
* If distcp compression is enables, suffix the filenames with default codec extension to indicate
the file is compressed. Thus users can be aware of what codec was used to compress the data.

This JIRA is similar to [HADOOP-8065|https://issues.apache.org/jira/browse/HADOOP-8065]. [HADOOP-8065|https://issues.apache.org/jira/browse/HADOOP-8065]
aims to compress data *during transit* which is a huge effort. This JIRA is simplified to
enable to user to compress data when the data lands on target filesystem.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message