Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 722B5E15D for ; Mon, 14 Jan 2013 19:19:06 +0000 (UTC) Received: (qmail 44208 invoked by uid 500); 14 Jan 2013 19:19:06 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 44095 invoked by uid 500); 14 Jan 2013 19:19:06 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 44086 invoked by uid 99); 14 Jan 2013 19:19:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jan 2013 19:19:06 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sagarmehta@gmail.com designates 74.125.82.51 as permitted sender) Received: from [74.125.82.51] (HELO mail-wg0-f51.google.com) (74.125.82.51) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jan 2013 19:18:59 +0000 Received: by mail-wg0-f51.google.com with SMTP id gg4so2169757wgb.18 for ; Mon, 14 Jan 2013 11:18:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=9tbfZQbIoYSFNSr/y3xFZqFOozHGHfTUf8JkAtvOomo=; b=xPLveZZy+HHPR1yAOpdUM7YKqJzJUeAdEbq3LR2MlsxL/h7G+clWCdtw/W3gHotb+W yyRmh5hI7VZrFXKgd6eqmlq/Y/EhfThUGk5lkiY2K9xcpJ6JOKp4tGDhN7rfz86PHOTW zyA677+9F8rRMCINbM6U/+7vEESklr6+Bp5LcMn2LLqx6KsAJvnNh5ZVtDG5Bc1HKNtA inStmCx8QAgWRDTsAH81KeSbvIVtKH7ePIBBkul4Uhpa+gd4v9vNOKbH1tvTBNbmSina g0AWFwNO8injj/RAxi/Ehmjg7bwVJPGU24MKDsOLw6C+tSb72/7Q/ho/jLUiyzkL7vlN FbWg== MIME-Version: 1.0 Received: by 10.194.23.37 with SMTP id j5mr136862950wjf.28.1358191118455; Mon, 14 Jan 2013 11:18:38 -0800 (PST) Received: by 10.194.169.103 with HTTP; Mon, 14 Jan 2013 11:18:38 -0800 (PST) Date: Mon, 14 Jan 2013 11:18:38 -0800 Message-ID: Subject: Question about gzip compression when using Flume Ng From: Sagar Mehta To: user@flume.apache.org Content-Type: multipart/alternative; boundary=047d7b414e8c2712b804d344836d X-Virus-Checked: Checked by ClamAV on apache.org --047d7b414e8c2712b804d344836d Content-Type: text/plain; charset=ISO-8859-1 Hi Guys, I'm using Flume Ng and it works great for me. In essence I'm using an exec source for doing tail -F on a logfile and using two HDFS sinks using a File channel. So far so great - Now I'm trying to use gzip compression using the following config as per the Flume-Ng User guide at http://flume.apache.org/FlumeUserGuide.html. #gzip compression related settings collector102.sinks.sink1.hdfs.codeC = gzip collector102.sinks.sink1.hdfs.fileType = CompressedStream collector102.sinks.sink1.hdfs.fileSuffix = .gz However this is what looks to be happening *Flume seems to write gzipped compressed output [I see the .gz files in the output buckets], however when I try to decompress it - I get an error about 'trailing garbage ignored' and the decompressed output is in fact smaller in size.* hadoop@jobtracker301:/home/hadoop/sagar/temp$ ls -ltr collector102.ngpipes.sac.ngmoco.com.1357936638713.gz -rw-r--r-- 1 hadoop hadoop *5381235* 2013-01-11 20:44 *collector102.ngpipes.sac.ngmoco.com.1357936638713.gz* hadoop@jobtracker301:/home/hadoop/sagar/temp$ gunzip collector102.ngpipes.sac.ngmoco.com.1357936638713.gz *gzip: collector102.ngpipes.sac.ngmoco.com.1357936638713.gz: decompression OK, trailing garbage ignored* * * hadoop@jobtracker301:/home/hadoop/sagar/temp$ ls -l -rw-r--r-- 1 hadoop hadoop *58898* 2013-01-11 20:44 * collector102.ngpipes.sac.ngmoco.com.1357936638713* * * *Below are some helpful details.* * * *I'm using apache-flume-1.4.0-SNAPSHOT-bin* * * smehta@collector102:/opt$ ls -l flume lrwxrwxrwx 1 root root 31 2012-12-14 00:44 flume -> apache-flume-1.4.0-SNAPSHOT-bin *I also have the hadoop-core jar in my path* smehta@collector102:/opt/flume/lib$ ls -l hadoop-core-0.20.2-cdh3u2.jar -rw-r--r-- 1 hadoop hadoop 3534499 2012-12-01 01:53 hadoop-core-0.20.2-cdh3u2.jar * * Everything is working well for me except the compression part. I'm not quite sure what I'm missing here. So while I debug this, any ideas/help is much appreciated. Thanks in advance, Sagar --047d7b414e8c2712b804d344836d Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Guys,=

I&= #39;m using Flume Ng and it works great for me. In essence I'm using an= exec source for doing =A0tail -F on a logfile and using two HDFS sinks usi= ng a File channel. So far so great - Now I'm trying to use gzip compres= sion using the following config as per=A0the Flume-Ng User guide at=A0http://flume.apache.org/FlumeUserGuid= e.html.

#gzip compression related settings
collector102.sinks.sink1.hdfs.codeC =3D gzip
collector102.sinks.sink1.hdfs.fileType =3D CompressedStream
collector102.sinks.sink1.hdfs.f= ileSuffix =3D .gz=A0

However t
his is what looks to be happening

Flume se= ems to write gzipped compressed output [I see the .gz files in the output b= uckets], however when I try to decompress it - I get an error about 'tr= ailing garbage ignored' and the decompressed output is in fact smaller = in size.

hadoop@jobtracker301:/home/hadoop/sagar/temp$ ls -ltr collector102.ngpipes.= sac.ngmoco.com.1357936638713.gz=A0
-rw-r--r-- 1 ha= doop hadoop=A05381235=A02013-01-11 20:44=A0
collector102.ngpipes.sac.ngmoco.com.1357936638713.gz=

hadoop@jobtracker301:/home/hadoop/sagar/temp$= gunzip collector102.ngpipes.sac.ngmoco.com.1357936638713.gz=A0

gzip: collector102.ngpipes.sac.ngmoco.com.= 1357936638713.gz: decompression OK, trailing garbage ignored

hadoop@jobtracker301:/home/hadoop/saga= r/temp$ ls -l

-rw-r--r-- 1 hadoop hadoop=A058898=A02= 013-01-11 20:44=A0collector102.ngpipes.sac.ngmoco.com.1357936638713<= /div>

Below are some = helpful details.

I'm using= =A0apache-flume-1.4.0-SNAPSHOT-bin

smehta@collec= tor102:/opt$ ls -l flume
lrwxrwxrwx 1 root root 31 2012-12-14 00:44 flume -> = apache-flume-1.4.0-SNAPSHOT-bin

I also have the hadoop-core jar in my path=

smehta@collector102:/opt/flume/lib$ ls -l hadoop-core-0= .20.2-cdh3u2.jar=A0
-rw-r--r-- 1 hadoop hadoop 3534499 2012-12-01= 01:53 hadoop-core-0.20.2-cdh3u2.jar

Everything is working well for me exce= pt the compression part. I'm not quite sure what I'm missing here. = So while I debug this, any ideas/help is much appreciated.

Thanks in advance,
<= div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.= 666666984558105px"> Sagar=A0
--047d7b414e8c2712b804d344836d--