From Mohit Durgapal <durgapalmo...@gmail.com>
Subject how to load balance flume
Date Thu, 14 Aug 2014 07:25:34 GMT
I have a requirement where I need to feed push traffic(comma separated
logs) at a very high rate to flume.
I have three concerns:

   1. I am using php to send events to flume through rsyslog. The code I am
   using is :

*openlog("mylogs", LOG_NDELAY, LOG_LOCAL2);
   syslog(LOG_INFO, "aaid,bid,cid,info1,info2,....");

   I want to add some fields as headers in the above event  log "
   *aaid,bid,cid,info1,info2,....*"  , I don't see any function in php
   where I could add headers for some fields so that I can take some action
   based on just the headers without opening the complete msg.

   2. How to load balance the trafffic. I want the logger to forward the
   logs to the load balancer and then the load balancer to choose a flume
   node(based on various factors like current load, cpu utilization) and also
   handle failures(divert traffic if a flume node goes down).

    I looked at the flume based load balancer but it provides just two
   options: Round Robin and Random load balancing. Any ideas as to how I could
   do this load balancing with failure detection and handling would be very

   3. I want to update a cache in real-time from flume(using interceptor).
   I want a hashing based approach to divert certain traffic(based on a field
   or header in log) to certain nodes, so that one node is responsible for
   updating rows with keys under same hash bucket. This is to avoid row level

I hope I have explained my requirements well enough for everyone to
understand. But If it's not as clear as I think, please let me know.


