flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viral Bajaria <viral.baja...@gmail.com>
Subject Re: flume hdfs sink notify / callback to add partition
Date Fri, 01 Aug 2014 02:55:53 GMT
Any suggestions on this ? Still trying to figure out how do I get a
notification that a new partition is being created by the HDFS sink and I
can add that via a ALTER TABLE statement on a separate thread.

Is adding a callback the right way to handle this ?

Thanks,
Viral



On Mon, Jul 28, 2014 at 2:40 PM, Viral Bajaria <viral.bajaria@gmail.com>
wrote:

> Hi,
>
> Is there a way to get the hdfs sink to signal that a file was just closed
> and then use that signal to add a partition to hive if one does not exist
> already.
>
> Right now, what I do is:
>
> - move files to s3
> - run recover partitions <--- step takes forever.
>
> But given that I have so much historical data, it's not feasible to run
> recover partitions every single day since it takes forever.
>
> I had much rather add an extra partition whenever I see a file in that
> partition for the first time.
>
> I looked around the code base and it seems the Flume-OG had something like
> this but I don't see the capability in Flume-NG.
>
> I can see a way to adding this by adding another Callback parameter to the
> HdfsEventSink and create a customer wrapper around it.
>
> Any other suggestions ?
>
> Thanks,
> Viral
>
>

Mime
View raw message