falcon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Yadav <ajayn...@gmail.com>
Subject Re: lifecycle - retention
Date Mon, 25 Jan 2016 05:03:35 GMT
Hi John,

To avoid reading each line/record of the input we usually divide the data
by date, e.g. all data for a day in one file. This way you can avoid
scanning data for all dates during retention. Usually this sort of
modelling is a good idea for general processing of data also as consumers
typically consume data for a time range. Sometimes it is not possible to
*produce* data in such fashion and we have to write aggregator processes to
batch data. If this is not possible to divide data by date for your use
case then there is no way to delete data for a particular date without
reading each line/record of the input file, with or without falcon.

On Mon, Jan 25, 2016 at 5:03 AM, John Smith <lenovomi@gmail.com> wrote:

> Ok,
> but in general to execute/or process that kind of requirement there is
> no other way as to read each line/record of the input file.
> On Mon, Jan 25, 2016 at 12:23 AM, Venkat Ramachandran
> <me.venkatr@gmail.com> wrote:
> > It's a good idea to open a JIRA with your requirements.
> > You can either implement a custom pig job that reads and removes the
> > expired rows or you can leverage the new Lifecycle feature introduced in
> > Falcon 0.8 that allows you to provide your own plugin for retention
> > implementation.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message