flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jun MA <mj.saber1...@gmail.com>
Subject Re: Flafka: how to differentiate the unfinished .tmp file and the abandoned one
Date Fri, 10 Jul 2015 17:11:24 GMT
Thanks for explaining. Is there a way that I can make kafka channel commit offset only after
successfully rename?

> On Jul 8, 2015, at 6:10 PM, Hari Shreedharan <hshreedharan@cloudera.com> wrote:
> 
> If you kill the agent (not a kill -9) the temp files will be renamed (we wait for a while
for rename to be completed), so it should not happen. But if you do a kill -9, there is not
a whole lot we can do on the flume side. If you notice a file not being written to for a while
after a restart, just rename it via the hdfs command.
> 
> On Wednesday, July 8, 2015, Jun MA <mj.saber1990@gmail.com <mailto:mj.saber1990@gmail.com>>
wrote:
> Hello Community,
> 
> I’m using Flafka (Kafka channel and HDFS sink). I met an awkward problem that I don’t
know how to determinate if a .tmp file is being written or it is been abandoned? If sink is
writing events to a file, it will have a postfix .tmp, but if the agent goes down (control
+ d) while writing to that file, it will not rename the file but left it with .tmp postfix.
When restart the agent, it will not do anything to that .tmp file. But the events in that
.tmp file is not redundant because at the kafka channel side, the offset is already committed.
> So my question is that if there is a way to differentiate the working .tmp file and the
finished .tmp file?
> 
> Thanks,
> Jun
> 
> 
> -- 
> 
> Thanks,
> Hari
> 


Mime
View raw message