flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juhani Connolly <juhani_conno...@cyberagent.co.jp>
Subject Re: Memory Channel
Date Thu, 17 Jan 2013 02:00:48 GMT
The channel is a temporary storage device that decouples the source from 
the sink.

Adding and removing data to it are achieved with transactions that 
either put or take one or more events. Sources put data in and sinks 
take it out.

When a batch is received by the source it will store it to the channel. 
If this is a memory channel this means the only guarrantee is that all 
the events are now stored in memory on this agent.

When a sink then processes a batch of data, once it commits the 
transaction that data will be removed from the channel. If the sink is a 
RollingFileSink or other similar physical media sink, at this point you 
could consider the data as having been sync'ed.

The timing of the sinks process() calls which handle a batch of 
events(what you are referring to as syncing) is governed by the sink 
runner which has its own thread.

If your source is generating data faster than your sink can process it, 
there can be an increasing delay between being put in the channel and 
getting "sync"ed to hdfs/whatever. This can often be resolved by 
increasing thread counts or adding more sinks, but may be caused by HDFS 
or your disk simply being too slow.

On 01/17/2013 04:03 AM, Mohit Anchlia wrote:
> Just one more question, when I write using memorychannel does that 
> write immediately gets written to the sink? It may not get sync on 
> HDFS but does it at least immediately gets written. I am trying to see 
> if the events are held in flume's memory or not.
>
> On Wed, Jan 16, 2013 at 11:00 AM, Brock Noland <brock@cloudera.com 
> <mailto:brock@cloudera.com>> wrote:
>
>     The HDFS Sink syncs at the end of each batch or when the file rolls.
>
>     On Wed, Jan 16, 2013 at 10:55 AM, Nitin Pawar
>     <nitinpawar432@gmail.com <mailto:nitinpawar432@gmail.com>> wrote:
>     > you can configure it as you nee
>     > number of events
>     > rollover by time
>     > and other ways as well
>     >
>     >
>     > On Thu, Jan 17, 2013 at 12:17 AM, Mohit Anchlia
>     <mohitanchlia@gmail.com <mailto:mohitanchlia@gmail.com>>
>     > wrote:
>     >>
>     >> Right. I was asking about sync to "sink". My sink is hdfs so
>     does flume
>     >> sync to hdfs on every write operation?
>     >>
>     >>
>     >> On Wed, Jan 16, 2013 at 10:26 AM, Brock Noland
>     <brock@cloudera.com <mailto:brock@cloudera.com>> wrote:
>     >>>
>     >>> Memory Channel does not write to disk and as such never syncs
>     to disk.
>     >>> File Channel does sync to disk for each batch put on or taken
>     off the
>     >>> channel.
>     >>>
>     >>> On Wed, Jan 16, 2013 at 10:21 AM, Mohit Anchlia
>     <mohitanchlia@gmail.com <mailto:mohitanchlia@gmail.com>>
>     >>> wrote:
>     >>> > Thanks! What I am really trying to understand is when does
>     flume sync
>     >>> > to the
>     >>> > sink. I am not using batch events.
>     >>> >
>     >>> >
>     >>> > On Wed, Jan 16, 2013 at 9:55 AM, Hari Shreedharan
>     >>> > <hshreedharan@cloudera.com
>     <mailto:hshreedharan@cloudera.com>> wrote:
>     >>> >>
>     >>> >> It means that the channel can store that many events. If it
>     is full,
>     >>> >> then
>     >>> >> the put() calls (on the source side) will start throwing
>     >>> >> ChannelException.
>     >>> >> The put call will block only for keep-alive number of
>     seconds, after
>     >>> >> which
>     >>> >> it will throw.
>     >>> >>
>     >>> >>
>     >>> >> Hari
>     >>> >>
>     >>> >> --
>     >>> >> Hari Shreedharan
>     >>> >>
>     >>> >> On Wednesday, January 16, 2013 at 9:46 AM, Mohit Anchlia wrote:
>     >>> >>
>     >>> >> Could someone help me understand capacity attribute of
>     memoryChannel?
>     >>> >> Does
>     >>> >> it mean that memoryChannel flushes to sink only when this
>     capacity is
>     >>> >> reached or does it mean that it's the max events stored in
>     memory and
>     >>> >> call
>     >>> >> blocks until everything else gets freed?
>     >>> >>
>     >>> >>
>     >>> >> http://flume.apache.org/FlumeUserGuide.html#memory-channel
>     >>> >>
>     >>> >>
>     >>> >>
>     >>> >
>     >>>
>     >>>
>     >>>
>     >>> --
>     >>> Apache MRUnit - Unit testing MapReduce -
>     >>> http://incubator.apache.org/mrunit/
>     >>
>     >>
>     >
>     >
>     >
>     > --
>     > Nitin Pawar
>
>
>
>     --
>     Apache MRUnit - Unit testing MapReduce -
>     http://incubator.apache.org/mrunit/
>
>


Mime
View raw message