flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Rathbone <matt...@foursquare.com>
Subject Re: Question on flume log4j appender and avrosource
Date Thu, 01 Sep 2011 22:16:49 GMT
A setup similar to what we have at foursquare would be:

Each of the 20 nodes behind a proxy runs a local flume-node. App-code sends logs via thrift
to their local flumes.
1 machine acts as a collector. the 20 nodes send their data to the collector, the collector
writes the data to hdfs/s3/whatever.

This works pretty well, but I'ld stress the following things if you plan on using rpc's at
1) Use version 0.9.3, or better yet, wait until 0.9.5, there are a couple of critical rpc
bug fixes not in version 0.9.4 (we're about to deploy a version we built from the current
0.9.5 trunk)
2) Even version 0.9.3 has a bunch of rpc-based bugs which mean you'll have to restart nodes
whenever you change their config, but this is manageable.

This setup works very well once it's up and running, and version 0.9.5 will make it much more
bullet proof.
Generally the local flume-nodes consume minimal resources, you can really hit them hard without
them causing an issue. Resource usage will not be a problem.

Hope that helps somewhat?
Matthew Rathbone
Foursquare | Software Engineer | Server Engineering Team
matthew@foursquare.com (mailto:matthew@foursquare.com) | @rathboma (http://twitter.com/rathboma)
| 4sq (http://foursquare.com/rathboma)

On Thursday, September 1, 2011 at 5:03 PM, Avinash Shahdadpuri wrote:

> Hi,
> We have recently started using flume.
> We have 20 servers behind a load balancer and want to see if we can avoid running flume
node on all of them. 
> We are looking at an option of using the flume log4j appender & avrosource &
dedicated flume nodes (machines just running flume) 
> 1. We can use flume log4j appender to stream logs to a dedicated flume node running flume
agent/collector. In this case, if the flume node goes down, we would lose the messages.
> 2. The other option is to flume log4j appender to stream logs on the same machine. In
this case we would need an agent on flume node to read the remote server. The avrosource agent
doesn't seem to be able to read from remote machine? Is there something else we can do here.
> Has anyone come across this and do you have any recommendations to handle this.
> Please let me know.
> Thanks,
>  Avinash

View raw message