Mailing-List: contact user-help@flume.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flume.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <1175925603.9863.1415748919846.JavaMail.yahoo@jws100155.mail.ne1.yahoo.com>
References: 
 <452727116.4652.1415745975662.JavaMail.yahoo@jws100150.mail.ne1.yahoo.com>
 <1175925603.9863.1415748919846.JavaMail.yahoo@jws100155.mail.ne1.yahoo.com>
From: Mike Percy <mpercy@apache.org>
Date: Wed, 12 Nov 2014 16:24:56 -0800
Message-ID: 
 <CAJLbxRa8UNkBKOrwJCvHyUXNYyKPx8-qf2hzwwGJnPcs8P68Mw@mail.gmail.com>
Subject: Re: How to convert *.bz2.tmp to *.bz2 file after restating the
 instance
To: "user@flume.apache.org" <user@flume.apache.org>,
 Arun Gujjar <arungujjartest@yahoo.com>
Content-Type: multipart/alternative; boundary=001a11349392191abb0507b28d3b

--001a11349392191abb0507b28d3b
Content-Type: text/plain; charset=UTF-8

Depending on your configuration setup, every batch is likely writing a
stream of bzip2 and these are effectively concatenated together into a
single file. So Hive should (hopefully) be reading all of them except the
last (partial) batch, which is OK to throw away because Flume will retry it
when it comes back up. If Hive doesn't support that, maybe you should try
writing in a format other than compressed text -- possibly compressed Avro
or compressed SequenceFile (both of these formats support compression
internally and are handled well by most tools).

Regarding the .tmp file, this should be manually renamed to a non-tmp file
when a server crash or ungraceful shutdown happens (or set up a cron job to
look for old ones). Flume doesn't currently try to remember the .tmp files
it previously wrote to and try to rename or continue them.

Mike

On Tue, Nov 11, 2014 at 3:35 PM, Arun Gujjar <arungujjartest@yahoo.com>
wrote:

> Hi,
>
>
> Whenever we restart flume agent it creates a new HDFS file and start
> writing the data into that file. The earlier file which was created will
> still be left as *bz2.tmp and from HIVE queries we found that we were not
> able to read the data from this file.
> Here are the two questions I have .
> 1. Could you please suggest how we can convert this bz2.tmp to bz2 file?
> because we loose this data i.e. present in bz2.tmp file today.
> 2. Is there as way to configure flume to start writing the data into the
> existing bz2.tmp file instead of creating a new file?
>
> Can someone please answer this?
>
> Regards
> Arun
>
>
>

--001a11349392191abb0507b28d3b
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Depending on your configuration setup, every batch is like=
ly writing a stream of bzip2 and these are effectively concatenated togethe=
r into a single file. So Hive should (hopefully) be reading all of them exc=
ept the last (partial) batch, which is OK to throw away because Flume will =
retry it when it comes back up. If Hive doesn&#39;t support that, maybe you=
 should try writing in a format other than compressed text -- possibly comp=
ressed Avro or compressed SequenceFile (both of these formats support compr=
ession internally and are handled well by most tools).<div><br></div><div>R=
egarding the .tmp file, this should be manually renamed to a non-tmp file w=
hen a server crash or ungraceful shutdown happens (or set up a cron job to =
look for old ones). Flume doesn&#39;t currently try to remember the .tmp fi=
les it previously wrote to and try to rename or continue them.</div><div><b=
r></div><div>Mike</div></div><div class=3D"gmail_extra"><br><div class=3D"g=
mail_quote">On Tue, Nov 11, 2014 at 3:35 PM, Arun Gujjar <span dir=3D"ltr">=
&lt;<a href=3D"mailto:arungujjartest@yahoo.com" target=3D"_blank">arungujja=
rtest@yahoo.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><di=
v><div style=3D"color:#000;background-color:#fff;font-family:HelveticaNeue,=
Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;font-size:16px"><di=
v><span style=3D"color:rgb(51,51,51);font-family:Arial,sans-serif;font-size=
:14px;line-height:20px">Hi,</span><br></div><div><br><br></div><div style=
=3D"display:block"><div style=3D"font-family:HelveticaNeue,Helvetica Neue,H=
elvetica,Arial,Lucida Grande,sans-serif;font-size:16px"><div style=3D"font-=
family:HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-seri=
f;font-size:16px"><div><div><div><div style=3D"color:#000;background-color:=
#fff;font-family:HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande=
,sans-serif;font-size:16px"><div style=3D"margin-top:10px;margin-bottom:0px=
;color:rgb(51,51,51);font-family:Arial,sans-serif;font-size:14px;line-heigh=
t:20px">Whenever we restart flume agent it creates a new HDFS file and star=
t writing the data into that file. The earlier file which was created will =
still be left as *bz2.tmp and from HIVE queries we found that we were not a=
ble to read the data from this file.</div><div style=3D"margin-top:10px;mar=
gin-bottom:0px;color:rgb(51,51,51);font-family:Arial,sans-serif;font-size:1=
4px;line-height:20px">Here are the two questions I have .<br>1. Could you p=
lease suggest how we can convert this bz2.tmp to bz2 file? because we loose=
 this data i.e. present in bz2.tmp file today.=C2=A0<br>2. Is there as way =
to configure flume to start writing the data into the existing bz2.tmp file=
 instead of creating a new file?<br></div><div dir=3D"ltr"><br></div><div d=
ir=3D"ltr">Can someone please answer this?</div><div dir=3D"ltr"><br></div>=
<div dir=3D"ltr">Regards</div><span class=3D"HOEnZb"><font color=3D"#888888=
"><div dir=3D"ltr">Arun</div></font></span></div></div></div><br><br></div>=
  </div> </div>  </div> </div></div></blockquote></div><br></div>

--001a11349392191abb0507b28d3b--