Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB6A5C74B for ; Thu, 13 Nov 2014 00:25:38 +0000 (UTC) Received: (qmail 35724 invoked by uid 500); 13 Nov 2014 00:25:38 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 35672 invoked by uid 500); 13 Nov 2014 00:25:38 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 35663 invoked by uid 99); 13 Nov 2014 00:25:38 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Nov 2014 00:25:38 +0000 Received: from localhost (HELO mail-lb0-f169.google.com) (127.0.0.1) (smtp-auth username mpercy, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Nov 2014 00:25:37 +0000 Received: by mail-lb0-f169.google.com with SMTP id 10so10294651lbg.14 for ; Wed, 12 Nov 2014 16:25:36 -0800 (PST) X-Received: by 10.152.206.11 with SMTP id lk11mr46004897lac.42.1415838336357; Wed, 12 Nov 2014 16:25:36 -0800 (PST) MIME-Version: 1.0 Received: by 10.152.23.199 with HTTP; Wed, 12 Nov 2014 16:24:56 -0800 (PST) In-Reply-To: <1175925603.9863.1415748919846.JavaMail.yahoo@jws100155.mail.ne1.yahoo.com> References: <452727116.4652.1415745975662.JavaMail.yahoo@jws100150.mail.ne1.yahoo.com> <1175925603.9863.1415748919846.JavaMail.yahoo@jws100155.mail.ne1.yahoo.com> From: Mike Percy Date: Wed, 12 Nov 2014 16:24:56 -0800 Message-ID: Subject: Re: How to convert *.bz2.tmp to *.bz2 file after restating the instance To: "user@flume.apache.org" , Arun Gujjar Content-Type: multipart/alternative; boundary=001a11349392191abb0507b28d3b --001a11349392191abb0507b28d3b Content-Type: text/plain; charset=UTF-8 Depending on your configuration setup, every batch is likely writing a stream of bzip2 and these are effectively concatenated together into a single file. So Hive should (hopefully) be reading all of them except the last (partial) batch, which is OK to throw away because Flume will retry it when it comes back up. If Hive doesn't support that, maybe you should try writing in a format other than compressed text -- possibly compressed Avro or compressed SequenceFile (both of these formats support compression internally and are handled well by most tools). Regarding the .tmp file, this should be manually renamed to a non-tmp file when a server crash or ungraceful shutdown happens (or set up a cron job to look for old ones). Flume doesn't currently try to remember the .tmp files it previously wrote to and try to rename or continue them. Mike On Tue, Nov 11, 2014 at 3:35 PM, Arun Gujjar wrote: > Hi, > > > Whenever we restart flume agent it creates a new HDFS file and start > writing the data into that file. The earlier file which was created will > still be left as *bz2.tmp and from HIVE queries we found that we were not > able to read the data from this file. > Here are the two questions I have . > 1. Could you please suggest how we can convert this bz2.tmp to bz2 file? > because we loose this data i.e. present in bz2.tmp file today. > 2. Is there as way to configure flume to start writing the data into the > existing bz2.tmp file instead of creating a new file? > > Can someone please answer this? > > Regards > Arun > > > --001a11349392191abb0507b28d3b Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Depending on your configuration setup, every batch is like= ly writing a stream of bzip2 and these are effectively concatenated togethe= r into a single file. So Hive should (hopefully) be reading all of them exc= ept the last (partial) batch, which is OK to throw away because Flume will = retry it when it comes back up. If Hive doesn't support that, maybe you= should try writing in a format other than compressed text -- possibly comp= ressed Avro or compressed SequenceFile (both of these formats support compr= ession internally and are handled well by most tools).

R= egarding the .tmp file, this should be manually renamed to a non-tmp file w= hen a server crash or ungraceful shutdown happens (or set up a cron job to = look for old ones). Flume doesn't currently try to remember the .tmp fi= les it previously wrote to and try to rename or continue them.
Mike

On Tue, Nov 11, 2014 at 3:35 PM, Arun Gujjar = <arungujja= rtest@yahoo.com> wrote:
Hi,


Whenever we restart flume agent it creates a new HDFS file and star= t writing the data into that file. The earlier file which was created will = still be left as *bz2.tmp and from HIVE queries we found that we were not a= ble to read the data from this file.
Here are the two questions I have .
1. Could you p= lease suggest how we can convert this bz2.tmp to bz2 file? because we loose= this data i.e. present in bz2.tmp file today.=C2=A0
2. Is there as way = to configure flume to start writing the data into the existing bz2.tmp file= instead of creating a new file?

Can someone please answer this?

=
Regards
Arun


=

--001a11349392191abb0507b28d3b--