flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: .tmp in hdfs sink
Date Fri, 16 Nov 2012 04:16:27 GMT
Another question I had was about rollover. What's the best way to rollover
files in reasonable timeframe? For instance our path is YY/MM/DD/HH so
every hour there is new file and the -1 hr is just sitting with .tmp and it
takes sometimes even hour before .tmp is closed and renamed to .snappy. In
this situation is there a way to tell flume to rollover files sooner based
on some idle time limit?

On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:

> Thanks Mike it makes sense. Anyway I can help?
> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy <mpercy@apache.org> wrote:
>> Hi Mohit, this is a complicated issue. I've filed
>> https://issues.apache.org/jira/browse/FLUME-1714 to track it.
>> In short, it would require a non-trivial amount of work to implement
>> this, and it would need to be done carefully. I agree that it would be
>> better if Flume handled this case more gracefully than it does today.
>> Today, Flume assumes that you have some job that would go and clean up the
>> .tmp files as needed, and that you understand that they could be partially
>> written if a crash occurred.
>> Regards,
>> Mike
>> On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:
>>> What we are seeing is that if flume gets killed either because of server
>>> failure or other reasons, it keeps around the .tmp file. Sometimes for
>>> whatever reasons .tmp file is not readable. Is there a way to rollover .tmp
>>> file more gracefully?

View raw message