Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B93C5200AE1 for ; Sun, 8 May 2016 01:54:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B7B3C160A02; Sat, 7 May 2016 23:54:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 0B6BB160A01 for ; Sun, 8 May 2016 01:54:13 +0200 (CEST) Received: (qmail 90352 invoked by uid 500); 7 May 2016 23:54:13 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 90339 invoked by uid 99); 7 May 2016 23:54:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 May 2016 23:54:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id EC4CF2C14F9 for ; Sat, 7 May 2016 23:54:12 +0000 (UTC) Date: Sat, 7 May 2016 23:54:12 +0000 (UTC) From: "Suraj Nayak (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HADOOP-13114) DistCp should have option to compress data on write MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 07 May 2016 23:54:14 -0000 [ https://issues.apache.org/jira/browse/HADOOP-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suraj Nayak updated HADOOP-13114: --------------------------------- Attachment: HADOOP-13114-trunk_2016-05-07-1.patch * Added {{-compressoutput}} option. * JUnit TestCases for output compression test and option parsing * Created helper method {{getCodec}} which sets codec only once -> Needs review. > DistCp should have option to compress data on write > --------------------------------------------------- > > Key: HADOOP-13114 > URL: https://issues.apache.org/jira/browse/HADOOP-13114 > Project: Hadoop Common > Issue Type: Improvement > Reporter: Suraj Nayak > Assignee: Suraj Nayak > Priority: Minor > Labels: distcp > Fix For: 3.0.0 > > Attachments: HADOOP-13114-trunk_2016-05-07-1.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > DistCp utility should have capability to store data in user specified compression format. This avoids one hop of compressing data after transfer. Backup strategies to different cluster also get benefit of saving one IO operation to and from HDFS, thus saving resources, time and effort. > * Create an option -compressOutput defaulting to {{org.apache.hadoop.io.compress.BZip2Codec}}. > * Users will be able to change codec with {{-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec}} > * If distcp compression is enabled, suffix the filenames with default codec extension to indicate the file is compressed. Thus users can be aware of what codec was used to compress the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org