flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh B <qna.list.141...@gmail.com>
Subject Re: seeking help on flume cluster deployment
Date Fri, 10 Jan 2014 03:18:33 GMT
Hi Chen,

I think Flume doesn't have a way to configure multiple sources pointing to
same data source. Of course you can do that, but you will end up with
duplicate data. Flume offers fail over at the sink level.

On Thu, Jan 9, 2014 at 6:56 PM, Chen Wang <chen.apache.solr@gmail.com>wrote:

> Ok. so after more researching:) It seems that what i need is the failover
> for agent source, (not fail over for sink):
> If one agent dies, another same kind of agent will start running.
> Does flume support this scenario?
> Thanks,
> Chen
> On Thu, Jan 9, 2014 at 3:12 PM, Chen Wang <chen.apache.solr@gmail.com>wrote:
>> After reading more docs, it seems that if I want to achieve my goal, i
>> have to do the following:
>> 1. Having one agent with the custom source running on one node. This
>> agent reads from those 5 socket server, and sink to some kind of sink(maybe
>> another socket?)
>> 2. On another(or more) machines, setting up collectors that read from the
>> agent sink in 1, and sink to hdfs.
>> 3. Having a master node managing nodes in 1,2.
>> But it seems to be overskilled in my case: in 1, i can already sink to
>> hdfs. Since the data available at socket server are much faster than the
>> data translation part.  I want to be able to later add more nodes to do the
>> translation job. so what is the correct setup?
>> Thanks,
>> Chen
>> On Thu, Jan 9, 2014 at 2:38 PM, Chen Wang <chen.apache.solr@gmail.com>wrote:
>>> Guys,
>>> In my environment, the client is 5 socket servers. Thus i wrote a custom
>>> source spawning 5 threads reading from each of them infinitely,and the sink
>>> is hdfs(hive table). The work fine by running flume-ng agent.
>>> But how can i deploy this in distributed mode(cluster)? I am confused
>>> about the 3 ties(agent,collector,storage) mentioned in the doc. Does it
>>> apply to my case? How can I separate my agent/collect/storage? Apparently i
>>> can only have one agent running: multiple agent will result in getting
>>> duplicates from the socket server. But I want that if one agent dies, other
>>> agent can take it up. I would also like to be able to add horizontal
>>> scalability for writing to hdfs. How can I achieve all this?
>>> thank you very much for your advice.
>>> Chen

Mailing List Archives,

View raw message