flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sharninder <sharnin...@gmail.com>
Subject Re: how to load balance flume
Date Thu, 14 Aug 2014 10:50:36 GMT
I'm not sure without looking at the exact usecase, but maybe you can use
something like haproxy?


--
Sharninder



On Thu, Aug 14, 2014 at 4:08 PM, Mohit Durgapal <durgapalmohit@gmail.com>
wrote:

> Hi Sharninder,
>
> Thanks for the response. The load balancing is not based on header. To
> simplify, lets say I have one web server generating logs and three flume
> nodes receiving those logs. I want the load to be balanced on those three
> flume nodes based on cpu utilization and load.
>
>
>
>
>
> On Thu, Aug 14, 2014 at 4:01 PM, Sharninder <sharninder@gmail.com> wrote:
>
>> To add headers to the events, you can either send proper avro formatted
>> packets (which have a header) to an avro source, or implement a custom
>> interceptor to add headers after they're received by the syslog source.
>> There is a static interceptor bundled with flume that you can use. The
>> problem with that is that you can only add a single header (key->value) at
>> a time, as far as I know. But, its a good starting point to do what you
>> want to do.
>>
>> I didn't really understand your load balancing requirement but if its
>> based on the headers, you'll have to write your own interceptors.
>>
>>
>>
>> On Thu, Aug 14, 2014 at 12:55 PM, Mohit Durgapal <durgapalmohit@gmail.com
>> > wrote:
>>
>>> I have a requirement where I need to feed push traffic(comma separated
>>> logs) at a very high rate to flume.
>>> I have three concerns:
>>>
>>>
>>>    1. I am using php to send events to flume through rsyslog. The code
>>>    I am using is :
>>>
>>> *openlog("mylogs", LOG_NDELAY, LOG_LOCAL2);
>>>    syslog(LOG_INFO, "aaid,bid,cid,info1,info2,....");
>>>    closelog();*
>>>
>>>    I want to add some fields as headers in the above event  log "
>>>    *aaid,bid,cid,info1,info2,....*"  , I don't see any function in php
>>>    where I could add headers for some fields so that I can take some action
>>>    based on just the headers without opening the complete msg.
>>>
>>>    2. How to load balance the trafffic. I want the logger to forward
>>>    the logs to the load balancer and then the load balancer to choose a flume
>>>    node(based on various factors like current load, cpu utilization) and also
>>>    handle failures(divert traffic if a flume node goes down).
>>>
>>>     I looked at the flume based load balancer but it provides just two
>>>    options: Round Robin and Random load balancing. Any ideas as to how I could
>>>    do this load balancing with failure detection and handling would be very
>>>    helpful.
>>>
>>>    3. I want to update a cache in real-time from flume(using
>>>    interceptor). I want a hashing based approach to divert certain
>>>    traffic(based on a field or header in log) to certain nodes, so that one
>>>    node is responsible for updating rows with keys under same hash bucket.
>>>    This is to avoid row level locking.
>>>
>>>
>>> I hope I have explained my requirements well enough for everyone to
>>> understand. But If it's not as clear as I think, please let me know.
>>>
>>>
>>> Regards
>>> Mohit
>>>
>>
>>
>

Mime
View raw message