Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD8D318C42 for ; Wed, 5 Aug 2015 20:20:21 +0000 (UTC) Received: (qmail 63773 invoked by uid 500); 5 Aug 2015 20:20:16 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 63668 invoked by uid 500); 5 Aug 2015 20:20:16 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 63658 invoked by uid 99); 5 Aug 2015 20:20:16 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Aug 2015 20:20:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E8B9CDAB6F for ; Wed, 5 Aug 2015 20:20:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.148 X-Spam-Level: *** X-Spam-Status: No, score=3.148 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id iCaktPR2z2Zc for ; Wed, 5 Aug 2015 20:20:15 +0000 (UTC) Received: from mail-oi0-f47.google.com (mail-oi0-f47.google.com [209.85.218.47]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id EAC2A20F6E for ; Wed, 5 Aug 2015 20:20:14 +0000 (UTC) Received: by oigu206 with SMTP id u206so15826477oig.3 for ; Wed, 05 Aug 2015 13:20:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=iZxXAjjg6zt4IgQjQBqlVvbDyw5JRkDiHquJ0PCjoUo=; b=QKEZJ5Nlsc/7TNqLR25jBtgT5BYuAuZFtBecjK/pgxY7Ck4HSeQtyOIRnJ8ZO4BCuP LbdI2WaLtCLxwzy+ntlbkO4jrTkCIysPVqVfpbpNbQuTdofnhVGjOODeourrYWb3NLAV DW2sPez9RxZGE05kDot/BUrRRa8SI7IOHVLOBY4BOwvQ6tDDm0nKURoovCSPy1iEwKY8 iJbrqT4wo3VcXZ4R9+tYYfkGkUEKUsq7/WkstDenk5l1HAKTAOgiLbCTtH+A8om9xQ3P IeKVlq75ac1rRTzey5XqqFJAHfccnC0jSbPzJM5mND7HaTC6UW2LZEi02ZrPHbHVKyeY BGIQ== MIME-Version: 1.0 X-Received: by 10.202.186.132 with SMTP id k126mr9460475oif.60.1438806007783; Wed, 05 Aug 2015 13:20:07 -0700 (PDT) Received: by 10.202.88.195 with HTTP; Wed, 5 Aug 2015 13:20:07 -0700 (PDT) Date: Wed, 5 Aug 2015 13:20:07 -0700 Message-ID: Subject: compress folder in hadoop From: Kumar Jayapal To: user@hadoop.apache.org, "cdh-user@cloudera.org" Content-Type: multipart/alternative; boundary=001a113cd852fea4d2051c962026 --001a113cd852fea4d2051c962026 Content-Type: text/plain; charset=UTF-8 Hi All, How to compress a folder in hadoop? I want to compress a folder which has old data and not frequently used. How can I do that ? When I searched the web I got some idea to compress the files. Can some please help me understanding Why files are not in .lzo or .gz format. I am test executing below command for two types of compression, lzo and gzip when I check the files they are of same size. How do I check if the compression was successful,When I cat the files I can see the data. MR job was successfull and created these file.? # hadoop jar hadoop-streaming.jar "-Dmapreduce.compress.map.output=true" "-Dmapreduce.map.output.compression.codec=com.hadoop.compression.lzo.LzopCodec" "-Dmapreduce.output.compress=true" "-Dmapreduce.output.compression.codec=com.hadoop.compression.lzo.LzopCodec" -input /tmp/hdfs/hdfsNID9801P.csv -output /tmp/hdfs/hdfslzo # hadoop jar hadoop-streaming.jar "-Dmapreduce.compress.map.output=true" "-Dmapreduce.map.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec" "-Dmapreduce.output.compress=true" "-Dmapreduce.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec" -input /tmp/hdfs/hdfsNID9801P.csv -output /tmp/hdfs/hdfsgzip output partfiles below. 15/08/05 18:36:07 INFO streaming.StreamJob: Output directory: /tmp/hdfs/hdfsgzip # hadoop fs -ls /tmp/hdfs/hdfsgzip Found 5 items -rw-r--r-- 3 hdfs supergroup 0 2015-08-05 18:36 /tmp/hdfs/hdfsgzip/_SUCCESS -rw-r--r-- 3 hdfs supergroup 6061954911 2015-08-05 18:36 /tmp/hdfs/hdfsgzip/part-00000 -rw-r--r-- 3 hdfs supergroup 6062727606 2015-08-05 18:35 /tmp/hdfs/hdfsgzip/part-00001 -rw-r--r-- 3 hdfs supergroup 6064932250 2015-08-05 18:35 /tmp/hdfs/hdfsgzip/part-00002 -rw-r--r-- 3 hdfs supergroup 6062737354 2015-08-05 18:36 /tmp/hdfs/hdfsgzip/part-00003 # hadoop fs -ls /tmp/hdfs/hdfslzo Found 5 items -rw-r--r-- 3 hdfs supergroup 0 2015-08-05 18:28 /tmp/hdfs/hdfslzo/_SUCCESS -rw-r--r-- 3 hdfs supergroup 6061954911 2015-08-05 18:27 /tmp/hdfs/hdfslzo/part-00000 -rw-r--r-- 3 hdfs supergroup 6062727606 2015-08-05 18:27 /tmp/hdfs/hdfslzo/part-00001 -rw-r--r-- 3 hdfs supergroup 6064932250 2015-08-05 18:27 /tmp/hdfs/hdfslzo/part-00002 -rw-r--r-- 3 hdfs supergroup 6062737354 2015-08-05 18:28 /tmp/hdfs/hdfslzo/part-00003 it will be great help if you point me to any link regarding compression. Thanks Jay --001a113cd852fea4d2051c962026 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi All,

How to compress = a folder in hadoop?

I want to compress a folder wh= ich has old data and not frequently used. How can I do that ?
When I searched the web I got some idea to compress the files. = Can some please help me understanding Why files are not in .lzo or .gz form= at.


I am test executing below comma= nd for two types of compression, lzo and gzip when I check the files they a= re of same size. How do I check if the compression was successful,When I ca= t the files I can see the data.

=C2=A0MR job was s= uccessfull and created these file.?

# hadoop jar h= adoop-streaming.jar "-Dmapreduce.compress.map.output=3Dtrue" &quo= t;-Dmapreduce.map.output.compression.codec=3Dcom.hadoop.compression.lzo.Lzo= pCodec" "-Dmapreduce.output.compress=3Dtrue" "-Dmapredu= ce.output.compression.codec=3Dcom.hadoop.compression.lzo.LzopCodec" = =C2=A0-input /tmp/hdfs/hdfsNID9801P.csv -output /tmp/hdfs/hdfslzo


# hadoop jar hadoop-streaming.jar "-Dma= preduce.compress.map.output=3Dtrue" "-Dmapreduce.map.output.compr= ession.codec=3Dorg.apache.hadoop.io.compress.GzipCodec" "-Dmapred= uce.output.compress=3Dtrue" "-Dmapreduce.output.compression.codec= =3Dorg.apache.hadoop.io.compress.GzipCodec" =C2=A0-input /tmp/hdfs/hdf= sNID9801P.csv -output /tmp/hdfs/hdfsgzip



output partfiles below.

<= br>
15/08/05 18:36:07 INFO streaming.StreamJob: Output directory:= /tmp/hdfs/hdfsgzip
# hadoop fs -ls /tmp/hdfs/hdfsgzip
= Found 5 items
-rw-r--r-- =C2=A0 3 hdfs supergroup =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A00 2015-08-05 18:36 /tmp/hdfs/hdfsgzip/_SUCCESS
-rw-r--r-- =C2=A0 3 hdfs supergroup 6061954911 2015-08-05 18:36 /tmp/hdfs= /hdfsgzip/part-00000
-rw-r--r-- =C2=A0 3 hdfs supergroup 60627276= 06 2015-08-05 18:35 /tmp/hdfs/hdfsgzip/part-00001
-rw-r--r-- =C2= =A0 3 hdfs supergroup 6064932250 2015-08-05 18:35 /tmp/hdfs/hdfsgzip/part-0= 0002
-rw-r--r-- =C2=A0 3 hdfs supergroup 6062737354 2015-08-05 18= :36 /tmp/hdfs/hdfsgzip/part-00003
# hadoop fs -ls /tmp/hdfs/hdfsl= zo
Found 5 items
-rw-r--r-- =C2=A0 3 hdfs supergroup = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00 2015-08-05 18:28 /tmp/hdfs/hdfslzo/_SUC= CESS
-rw-r--r-- =C2=A0 3 hdfs supergroup 6061954911 2015-08-05 18= :27 /tmp/hdfs/hdfslzo/part-00000
-rw-r--r-- =C2=A0 3 hdfs supergr= oup 6062727606 2015-08-05 18:27 /tmp/hdfs/hdfslzo/part-00001
-rw-= r--r-- =C2=A0 3 hdfs supergroup 6064932250 2015-08-05 18:27 /tmp/hdfs/hdfsl= zo/part-00002
-rw-r--r-- =C2=A0 3 hdfs supergroup 6062737354 2015= -08-05 18:28 /tmp/hdfs/hdfslzo/part-00003

it will = be great help if you point me to any link regarding compression.
=


Thanks
Jay
--001a113cd852fea4d2051c962026--