Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F2E8E18600 for ; Thu, 30 Jul 2015 01:36:44 +0000 (UTC) Received: (qmail 41343 invoked by uid 500); 30 Jul 2015 01:36:36 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 41229 invoked by uid 500); 30 Jul 2015 01:36:36 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 41218 invoked by uid 99); 30 Jul 2015 01:36:36 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2015 01:36:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2EC31D8B8C for ; Thu, 30 Jul 2015 01:36:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.149 X-Spam-Level: *** X-Spam-Status: No, score=3.149 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id M4YenhbM7q5n for ; Thu, 30 Jul 2015 01:36:27 +0000 (UTC) Received: from mail-pd0-f177.google.com (mail-pd0-f177.google.com [209.85.192.177]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 9C1F220FE7 for ; Thu, 30 Jul 2015 01:36:26 +0000 (UTC) Received: by pdrg1 with SMTP id g1so15090579pdr.2 for ; Wed, 29 Jul 2015 18:36:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-transfer-encoding:content-type:from:mime-version:date :message-id:subject:references:in-reply-to:to; bh=ZSs3EAHZ1PRxwN2bUZLQdvxQtL/e5uzURfiGSzhIh5E=; b=k/RL58hAjjGdJE61NPD+frm8Ed6rgKmMY7gvOmuHDQJeJzX1wMoqKGA5CdrL2M+MdX umyQPyLTN8aA3BEo8yIEh7+bVyHMqlkkQizjWibu78dGrPDGDnISju4Y5++F0fiy5s2x q/EAbbjjXhGk1og+zyZ5ioV/udaCrzwUY8aThUd3V2/uWkTOIsIccQfhoPInGYx13tfQ K1zSSx5fp/SXdJDKnEDkGdN9kK31nWdBRJFPGpK8oqu+/mDlAHSu9yIJ08mceBwcnx9w d+/S+4IHeZJV3GyCmaVwkTm5VmH/ZKOoFcUiuZrUs3uEmwYsR0ogXiHsOD0fNlDVOIS3 UHFg== X-Received: by 10.70.96.194 with SMTP id du2mr101661744pdb.108.1438220185169; Wed, 29 Jul 2015 18:36:25 -0700 (PDT) Received: from ?IPv6:2601:600:8c00:2dae:5515:1786:24b3:673? ([2601:600:8c00:2dae:5515:1786:24b3:673]) by smtp.gmail.com with ESMTPSA id jg7sm22002053pac.1.2015.07.29.18.36.23 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 29 Jul 2015 18:36:24 -0700 (PDT) Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary=Apple-Mail-03E965B7-AB8D-4539-A83E-F60A04AD55C8 From: Hadoop User Mime-Version: 1.0 (1.0) Date: Wed, 29 Jul 2015 18:36:22 -0700 Message-Id: <197C0850-58BD-493F-99CC-8799D94DE5C6@gmail.com> Subject: Re: Need command to compress the files References: <55B971B5.8040603@yahoo.com> In-Reply-To: <55B971B5.8040603@yahoo.com> To: "user@hadoop.apache.org" X-Mailer: iPhone Mail (12H143) --Apple-Mail-03E965B7-AB8D-4539-A83E-F60A04AD55C8 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable I already have the data in HDFS. I want to test compression ratio with gzip a= nd snappy. Thanks=20 Sajid Sent from my iPhone > On Jul 29, 2015, at 5:37 PM, Ron Gonzalez wrote: >=20 > I think you can pick the compression algorithm when using sqoop - either d= eflate or snappy when specifying the --compress option. > Is that what you were asking? >=20 > Thanks, > Ron >=20 >> On 07/29/2015 03:40 PM, Ted Yu wrote: >> You can use the following command to see options for gzip: >> gzip -h >>=20 >> For snappy, see: >> https://github.com/kubo/snzip >> https://code.google.com/p/snappy/issues/detail?id=3D34 >>=20 >> FYI >>=20 >>> On Wed, Jul 29, 2015 at 3:34 PM, SP wrote: >>> Hi All, >>>=20 >>> I am working on comparing different compression ratios.=20 >>>=20 >>> I have these files in AVRO format. How can I compress them using snappy o= r gzip. >>>=20 >>> -rw-r--r-- 3 hdfs supergroup 3080866838 2015-07-29 18:16 /tmp/fact_spl= itby_date_id/part-m-00000.avro >>> -rw-r--r-- 3 hdfs supergroup 3021258762 2015-07-29 18:15 /tmp/fact_spl= itby_date_id/part-m-00001.avro >>> -rw-r--r-- 3 hdfs supergroup 3164101762 2015-07-29 18:17 /tmp/fact_spl= itby_date_id/part-m-00002.avro >>> -rw-r--r-- 3 hdfs supergroup 3251578205 2015-07-29 18:16 /tmp/fact_spl= itby_date_id/part-m-00003.avro >>>=20 >>>=20 >>>=20 >>>=20 >>> Thanks >>> Sp --Apple-Mail-03E965B7-AB8D-4539-A83E-F60A04AD55C8 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 7bit
I already have the data in HDFS. I want to test compression ratio with gzip and snappy.

Thanks 
Sajid

Sent from my iPhone

On Jul 29, 2015, at 5:37 PM, Ron Gonzalez <zlgonzalez@yahoo.com> wrote:

I think you can pick the compression algorithm when using sqoop - either deflate or snappy when specifying the --compress option.
Is that what you were asking?

Thanks,
Ron

On 07/29/2015 03:40 PM, Ted Yu wrote:
You can use the following command to see options for gzip:
gzip -h

For snappy, see:

FYI

On Wed, Jul 29, 2015 at 3:34 PM, SP <sajidmca@gmail.com> wrote:
Hi All,

I am working on comparing different compression ratios. 

I have these files in AVRO format. How can I compress them using snappy or gzip.

-rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00000.avro
-rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15 /tmp/fact_splitby_date_id/part-m-00001.avro
-rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17 /tmp/fact_splitby_date_id/part-m-00002.avro
-rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00003.avro




Thanks
Sp


--Apple-Mail-03E965B7-AB8D-4539-A83E-F60A04AD55C8--