flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Architecting Flume for failover
Date Wed, 20 Feb 2013 00:30:26 GMT
Can you change the hdfs.path to hdfs:// and hdfs://
on hdfsSink-1 and hdfsSink-2 respectively (assuming those are your namenodes)? The "bind"
configuration param does not really exist for HDFS Sink (it is only for the IPC sources).


Hari Shreedharan

On Tuesday, February 19, 2013 at 4:05 PM, Noel Duffy wrote:

> If I disable the agent.sinks line, both my sinks are disabled and nothing gets written
to HDFS. The status page no longer shows me any sinks.
> From: Yogi Nerella [mailto:ynerella999@gmail.com] 
> Sent: Wednesday, 20 February 2013 12:40 p.m.
> To: user@flume.apache.org (mailto:user@flume.apache.org)
> Subject: Re: Architecting Flume for failover
> Hi Noel,
> May be you are specifying  both sinkgroups and sinks.  
> Can you try removing the sinks.
> #agent.sinks = hdfsSink-1 hdfsSink-2
> Yogi
> On Tue, Feb 19, 2013 at 1:32 PM, Noel Duffy <noel.duffy@sli-systems.com (mailto:noel.duffy@sli-systems.com)>
> I have a Flume agent that pulls events from RabbitMQ and pushes them into HDFS. So far
so good, but now I want to have a second Flume agent on a different host acting as a hot backup
for the first agent such that the loss of the first host running Flume would not cause any
events to be lost. In the testing I've done I've gotten two Flume agents on separate hosts
to read the same events from the RabbitMQ queue, but it's not clear to me how to configure
the sinks such that only one of the sinks actually does something and the other does nothing.
> From reading the documentation, I supposed that a sinkgroup configured for failover was
what I needed, but the documentation examples only cover the case where the sinks in a failover
group are all on the same agent on the same host. I've seen messages online which seem to
say that sinks in a sinkgroup can be on different hosts, but I can find no clear explanation
of how to configure such a sinkgroup. How would sinks on different hosts communicate with
one another? Would the sinks in the sinkgroup have to use a JDBC channel? Would the sinks
have to be non-terminal sinks, like Avro?
> In my testing I set up two agents on different hosts and configured a sinkgroup containing
two sinks, both HDFS sinks.
> agent.sinkgroups = sinkgroup1
> agent.sinkgroups.sinkgroup1.sinks = hdfsSink-1 hdfsSink-2
> agent.sinkgroups.sinkgroup1.processor.priority.hdfsSink-1 = 5
> agent.sinkgroups.sinkgroup1.processor.priority.hdfsSink-2 = 10
> agent.sinkgroups.sinkgroup1.processor.type=failover
> agent.sinks = hdfsSink-1 hdfsSink-2
> agent.sinks.hdfsSink-1.type = hdfs
> agent.sinks.hdfsSink-1.bind =
> agent.sinks.hdfsSink-1.channel = fileChannel-1
> agent.sinks.hdfsSink-1.hdfs.path = /flume/localbrain-events
> agent.sinks.hdfsSink-1.hdfs.filePrefix = lb-events
> agent.sinks.hdfsSink-1.hdfs.round = false
> agent.sinks.hdfsSink-1.hdfs.rollCount=50
> agent.sinks.hdfsSink-1.hdfs.fileType=SequenceFile
> agent.sinks.hdfsSink-1.hdfs.writeFormat=Text
> agent.sinks.hdfsSink-1.hdfs.codeC = lzo
> agent.sinks.hdfsSink-1.hdfs.rollInterval=30
> agent.sinks.hdfsSink-1.hdfs.rollSize=0
> agent.sinks.hdfsSink-1.hdfs.batchSize=1
> agent.sinks.hdfsSink-2.bind =
> agent.sinks.hdfsSink-2.type = hdfs
> agent.sinks.hdfsSink-2.channel = fileChannel-1
> agent.sinks.hdfsSink-2.hdfs.path = /flume/localbrain-events
> agent.sinks.hdfsSink-2.hdfs.filePrefix = lb-events
> agent.sinks.hdfsSink-2.hdfs.round = false
> agent.sinks.hdfsSink-2.hdfs.rollCount=50
> agent.sinks.hdfsSink-2.hdfs.fileType=SequenceFile
> agent.sinks.hdfsSink-2.hdfs.writeFormat=Text
> agent.sinks.hdfsSink-2.hdfs.codeC = lzo
> agent.sinks.hdfsSink-2.hdfs.rollInterval=30
> agent.sinks.hdfsSink-2.hdfs.rollSize=0
> agent.sinks.hdfsSink-2.hdfs.batchSize=1
> However, this does not achieve the failover I hoped for. The sink hdfsSink-2 on both
agents writes the events to HDFS. The agents are not communicating, so the binding of the
sink to an ip address is not doing anything. 

View raw message