flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ehrlich <and...@aehrlich.com>
Subject Re: Fastest way to get data into flume?
Date Thu, 27 Mar 2014 18:07:57 GMT
What about having more than one flume agent?

You could have two agents that read the small messages and sink to HDFS, or
two agents that read the messages, serialize them, and send them to a third
agent which sinks them into HDFS.


On Thu, Mar 27, 2014 at 9:43 AM, Chris Schneider <
chris@christopher-schneider.com> wrote:

> I have a fair bit of data continually being created in the form of
> smallish messages (a few hundred bytes), which needs to enter flume, and
> eventually sink into HDFS.
>
> I need to be sure that the data lands in persistent storage and won't be
> lost, but otherwise throughput isn't important. It just needs to be fast
> enough to not back up.
>
> I'm running into a bottleneck in the initial ingestion of data.
>
> I've tried the netcat source, and the thrift source but both have capped
> out at a thousand or so records per second.
>
> Batching up the thrift api items into sets of 10 and using appendBatch is
> a pretty large speedup, but still not enough.
>
> Here's a gist of my ruby test script, and some example runs, and my config.
>
> https://gist.github.com/cschneid/9792305
>
>
> 1.  Are there any obvious performance changes I can do to speed up
> ingestion?
> 2. How fast can flume reasonably go? Should I switch my source to be
> something else that's faster? What?
> 3. Is there a better tool for this kind of task? (rapid, safe ingestion
> small messages).
>
> Thanks!
> Chris
>

Mime
View raw message