flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: HDFS Sink log rotation on the basis of time of writing
Date Mon, 05 Nov 2012 15:30:04 GMT
Hi,

If you just did not bucket the data at all, it would be organized by
the time they arrived at the sink.

Brock

On Fri, Nov 2, 2012 at 6:08 PM, Pankaj Gupta <pankaj@brightroll.com> wrote:
> Hi,
>
> Is it possible to organize files written to HDFS into buckets based on the
> time of writing rather than the timestamp in the header? Alternatively, is
> it possible to insert the timestamp injector just before the HDFS Sink?
>
> My use case is  to organize files such that they are organized
> chronologically as well as alphabetically by name and that there is only one
> file being written to at a time. This will make it easier to look for newly
> available data so that MapReduce jobs can process them.
>
> Thanks in Advance,
> Pankaj
>
>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message