flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Peleg <guy.pe...@gmail.com>
Subject HDFSChannel?
Date Thu, 13 Dec 2012 07:34:47 GMT
Say I have multi-hop flow, and lets say the last one stores its data in
HDFS using the HDFS sink.

In the last agent, as in every agent, there are the source-channel-sink
trio, my question is: why do we need that channel if the only thing that
agent does is store the events in HDFS (or other data source)?

Won't it be more efficient to have an 'HDFSChannel' that is part of the
transaction, and no sink at all? otherwise I might need to use persistent
channel (JDBC, File) to make sure that data is not lost before
it is moved to the sink, which again, is redundant, since ideally I would
like the incoming events, on the 'last agent' to be stored as quickly as
possible in their destination without paying the extra channel coast

Mime
View raw message