Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E4869E630 for ; Mon, 14 Jan 2013 23:17:31 +0000 (UTC) Received: (qmail 12177 invoked by uid 500); 14 Jan 2013 23:17:31 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 12138 invoked by uid 500); 14 Jan 2013 23:17:31 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 12130 invoked by uid 99); 14 Jan 2013 23:17:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jan 2013 23:17:31 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cwoodson.dev@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-wg0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Jan 2013 23:17:24 +0000 Received: by mail-wg0-f48.google.com with SMTP id 16so1454425wgi.27 for ; Mon, 14 Jan 2013 15:17:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=BiGDPd/r3Pi+yB1JpPmEuZX1ju08+DV6zR7Qx/+kSW8=; b=Fw5x8NNqI72gSkbACumPHR+EL2Y8Ok++fOWksFD/aBItBjRcN3A6vTa27zIg2HdLUC PjMOwCrW9lPp/Z56geCxVrbsunxVqVbO/hPBbi2SqmBmuvPtTLkSaE0ZO9YeicgiCt2B /mT0rkZKTiuRdh1IURCCH5xrGzdPEcbYMDLBG1amQ8xat1Jfv7O65gCv3yBF8BgyOVqz n2BuUqh5gsqPduQYhoZps1f+idnmPGOvwMett9YryuP6XuY7rCxVtQY4B/jXRMnyC6ez RFKl8rIx4VRLuhlr1GbyUjE4FHKUg4fvJtM1H5xMXLIHsPbt9u+cka6JeQhJM8+ch1ZH uC2g== MIME-Version: 1.0 Received: by 10.180.88.40 with SMTP id bd8mr208645wib.33.1358205424354; Mon, 14 Jan 2013 15:17:04 -0800 (PST) Received: by 10.227.2.196 with HTTP; Mon, 14 Jan 2013 15:17:04 -0800 (PST) In-Reply-To: References: Date: Mon, 14 Jan 2013 15:17:04 -0800 Message-ID: Subject: Re: Question about gzip compression when using Flume Ng From: Connor Woodson To: "user@flume.apache.org" Content-Type: multipart/alternative; boundary=f46d04428f12d9c44f04d347d74e X-Virus-Checked: Checked by ClamAV on apache.org --f46d04428f12d9c44f04d347d74e Content-Type: text/plain; charset=ISO-8859-1 What if you switch to bz2 compression? On Mon, Jan 14, 2013 at 3:12 PM, Sagar Mehta wrote: > Yeah I have tried the text write format in vain before, > but nevertheless gave it a try again!! Below is the latest file - still the > same thing. > > hadoop@jobtracker301:/home/hadoop/sagar/debug$ date > *Mon Jan 14 23:02:07 UTC 2013* > > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hls > /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz > Found 1 items > -rw-r--r-- 3 hadoop supergroup 4798117 *2013-01-14* *22:55 * > /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz > > hadoop@jobtracker301:/home/hadoop/sagar/debug$ hget > /ngpipes-raw-logs/2013-01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz > . > hadoop@jobtracker301:/home/hadoop/sagar/debug$ gunzip > collector102.ngpipes.sac.ngmoco.com.1358204141600.gz > > *gzip: collector102.ngpipes.sac.ngmoco.com.1358204141600.gz: > decompression OK, trailing garbage ignored* > * > * > *Interestingly enough, the gzip page says it is a harmless warning - > http://www.gzip.org/#faq8* > > However, I'm losing events on decompression so I cannot afford to ignore > this warning. The gzip page gives an example about magnetic tape - there is > an analogy of hdfs block here since the file is initially stored in hdfs > before I pull it out on the local filesystem. > > Sagar > > > > > On Mon, Jan 14, 2013 at 2:52 PM, Connor Woodson wrote: > >> collector102.sinks.sink1.hdfs.writeFormat = TEXT >> collector102.sinks.sink2.hdfs.writeFormat = TEXT >> > > > --f46d04428f12d9c44f04d347d74e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
What if you switch to bz2 compression?


On Mon, Jan 14, 2013 at 3:1= 2 PM, Sagar Mehta <sagarmehta@gmail.com> wrote:
Yeah I have tried the text wr= ite format in vain before, but=A0nevertheless=A0gave it a try again!! Below= is the latest file - still the same thing.

hadoop@jobtracker301:/home/hadoop/sagar/debug$ date
Mon Jan 14 23:02:07 UTC 2013

hadoop@jobtracker301:/home/hadoop/sagar/debug$ hls /ngpipes-raw-logs/2013= -01-14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz
Found 1 items
-rw-r--r-- =A0 3 hadoop supergroup =A0 =A04798117 <= b>2013-01-14 22:55 /ngpipes-raw-logs/2013-01-14/2200/collector10= 2.ngpipes.sac.ngmoco.com.1358204141600.gz

ha= doop@jobtracker301:/home/hadoop/sagar/debug$ hget /ngpipes-raw-logs/2013-01= -14/2200/collector102.ngpipes.sac.ngmoco.com.1358204141600.gz .
hadoop@jobtracker301:/home/hadoop/sagar/debug$ gunzip collector102.ngp= ipes.sac.ngmoco.com.1358204141600.gz=A0

gzip: c= ollector102.ngpipes.sac.ngmoco.com.1358204141600.gz: decompression OK, trai= ling garbage ignored

Interestingly enough, the gzip page says it i= s a harmless warning -=A0http://www.gzip.org/#faq8

However, I= 'm losing events on decompression so I cannot afford to ignore this war= ning. The gzip page gives an example about magnetic tape - there is an anal= ogy of hdfs block here since the file is initially stored in hdfs before I = pull it out on the local filesystem.

Sagar




On Mon, Jan 14, 2013 at 2:52 PM, Connor Woodson <cw= oodson.dev@gmail.com> wrote:
collector102.sinks.sink1.hdfs.writeForm= at =3D TEXT
collector102.sinks.sink2.hdfs.writeFormat =3D TEXT



--f46d04428f12d9c44f04d347d74e--