flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Ravindran <rahu...@yahoo.com>
Subject Re: Guarantees of the memory channel for delivering to sink
Date Tue, 06 Nov 2012 22:53:47 GMT
We will update the checkpoint each time (we may tune this to be periodic) but the contents
of the memory channel will be in the legacy logs which are currently being generated.


Additionally, the sink for the memory channel will be an Avro source in another machine.

Does that clear things up?


________________________________
 From: Brock Noland <brock@cloudera.com>
To: user@flume.apache.org; Rahul Ravindran <rahulrv@yahoo.com> 
Sent: Tuesday, November 6, 2012 1:44 PM
Subject: Re: Guarantees of the memory channel for delivering to sink
 
But in your architecture you are going to write the contents of the
memory channel out? Or did I miss something?

"The checkpoint will be updated each time we perform a successive
insertion into the memory channel."

On Tue, Nov 6, 2012 at 3:43 PM, Rahul Ravindran <rahulrv@yahoo.com> wrote:
> We have a legacy system which writes events to a file (existing log file).
> This will continue. If I used a filechannel, I will be double the number of
> IO operations(writes to the legacy log file, and writes to WAL).
>
> ________________________________
> From: Brock Noland <brock@cloudera.com>
> To: user@flume.apache.org; Rahul Ravindran <rahulrv@yahoo.com>
> Sent: Tuesday, November 6, 2012 1:38 PM
> Subject: Re: Guarantees of the memory channel for delivering to sink
>
> Your still going to be writing out all events, no? So how would file
> channel do more IO than that?
>
> On Tue, Nov 6, 2012 at 3:32 PM, Rahul Ravindran <rahulrv@yahoo.com> wrote:
>> Hi,
>>    I am very new to Flume and we are hoping to use it for our log
>> aggregation into HDFS. I have a few questions below:
>>
>> FileChannel will double our disk IO, which will affect IO performance on
>> certain performance sensitive machines. Hence, I was hoping to write a
>> custom Flume source which will use a memory channel, and which will
>> perform
>> checkpointing. The checkpoint will be updated each time we perform a
>> successive insertion into the memory channel. (I realize that this results
>> in a risk of data, the maximum size of which is the capacity of the memory
>> channel).
>>
>>    As long as there is capacity in the memory channel buffers, does the
>> memory channel guarantee delivery to a sink (does it wait for
>> acknowledgements, and retry failed packets)? This would mean that we need
>> to
>> ensure that we do not exceed the channel capacity.
>>
>> I am writing a custom source which will use the memory channel, and which
>> will catch a ChannelException to identify any channel capacity issues(so,
>> buffer used in the memory channel is full because of lagging sinks/network
>> issues etc). Is that a reasonable assumption to make?
>>
>> Thanks,
>> ~Rahul.
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
Mime
View raw message