flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish <paliwalash...@gmail.com>
Subject Re: Newbie - Sink question
Date Fri, 05 Sep 2014 06:08:50 GMT
I would recommend using an Interceptor for this and possibly a modified
Flume topology. If the json files have large numbers of rows or very high
number of files, go for a Collection tier, and use another level of agents
that uses interceptors for DB lookup and CSV generation. Something like

Collection Agents -> Transformation Agents (writing to S3 Sinks)

You can scale out Transformation/Collection layer agents  based on the
traffic volume


On Fri, Sep 5, 2014 at 8:23 AM, Kevin Warner <kevinwarner7965@gmail.com>

> Hello All,
> We have the following configuration:
> Source->Channel->Sink
> Now, the source is pointing to a folder that has lots of json files. The
> channel is file based so that there is fault tolerance and the Sink is
> putting CSV files on S3.
> Now, there is code written in Sink that takes the JSON events and does
> some MySQL database lookup and generates CSV files to be put into S3.
> The question is, is it the right place for the code or should the code be
> running in channel as the ACID gaurantees is present in Channel. Please
> advise.
> -Kev


Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

View raw message