flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Reading Flume spoolDir in parallel
Date Tue, 16 Sep 2014 18:30:06 GMT
Unfortunately, no. The spoolDir source was kept single-threaded so that
deserializer implementations can be kept simple. The approach with mutliple
spoolDir sources is the correct one, though they can all write to the same
channel(s) - so you'd need only a larger number of sources, they can all
share the same channel(s) and you don't need more sinks unless you want to
pull data out faster.

On Tue, Sep 16, 2014 at 11:26 AM, Haidang N <haidang_99@hotmail.com> wrote:

> Since I'm not allowed to set up Flume on prod servers, I have to download
> the logs, put them in a Flume spoolDir and have a sink to consume from the
> channel and write to Cassandra. Everything is working fine.
> However, as I have a lot of log files in the spoolDir, and the current
> setup is only processing 1 file at a time, it's taking a while. I want to
> be able to process many files concurrently. One way I thought of is to use
> the spoolDir but distribute the files into 5-10 different directories, and
> define multiple sources/channels/sinks, but this is a bit clumsy. Is there
> a better way to achieve this?
> Thanks

View raw message