Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9A1EAD4A2 for ; Wed, 7 Nov 2012 21:18:54 +0000 (UTC) Received: (qmail 4325 invoked by uid 500); 7 Nov 2012 21:18:54 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 4279 invoked by uid 500); 7 Nov 2012 21:18:54 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 4270 invoked by uid 99); 7 Nov 2012 21:18:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 21:18:54 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [72.30.238.74] (HELO nm35-vm2.bullet.mail.bf1.yahoo.com) (72.30.238.74) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Nov 2012 21:18:46 +0000 Received: from [98.139.212.148] by nm35.bullet.mail.bf1.yahoo.com with NNFMP; 07 Nov 2012 21:18:24 -0000 Received: from [98.139.212.225] by tm5.bullet.mail.bf1.yahoo.com with NNFMP; 07 Nov 2012 21:18:24 -0000 Received: from [127.0.0.1] by omp1034.mail.bf1.yahoo.com with NNFMP; 07 Nov 2012 21:18:24 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 421356.41774.bm@omp1034.mail.bf1.yahoo.com Received: (qmail 58132 invoked by uid 60001); 7 Nov 2012 21:18:24 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1352323104; bh=qXYFv8AmpP7vj4XfGT19VBOdi5Qo9/FohP/aceBddYM=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=6gvj719yFeU14768xuiRkXQK0MzKgZmZRSzKntp9hOSIujefCySCTT4vDmeUiU9hAfFzikAhyzcrGUhLNaMqacicWdmuyd+iPwKT/yTHEgREgM13eNPvEGiBID7ZkebZ6yhc0rPP/l1oC7N8TIyJ5Lv1irU3Jk/7X18GqFh8LIE= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=adQ/S0GTxr8Ee8L1iwme/0sqdwhI0dO+Jq5krw+euBCJfg1vx2LylqdRIllas3Bnl+6s66fNyLwVGAWT5B7r3qrRajsEkrLI2z9Jmk+JvvD9/pBR/1ZZPh8zKBcMZ+YJ9TagBo4s555k3Qb+z6Ro+DTMsNCqId4lHKaRUmPUzYw=; X-YMail-OSG: m_ZUQNAVM1lJqmeKbKBuUmuvk2BhaLb72NYZ51Br_HlT8cC exbExHxOHkc763dQsTEx_RxzB8yIH6wE8TZ1yKz5HzQUJ6dRTQL3dtPtqMcP JUURF0iTI9GugZWQuUCrsA_cehzFG1KWaM1W7gtWG71YFDbHE20D2D49Q0Yw pXlEhu_gHkJpZsoaXyi4pewoL81iPQROZefQc9lxUNYBfB9AUHNlyebeB_VT HH6r6O6YbRUhUgCnj.DfIx4tTJybwSMauN28.IXJU9tkxWLQWZmy4tb0u175 fPZwNygTv99IHqcVfv2bzvnTaK_EcFnYCaJPgLk44AjlIHhDiHY3.xUB5y.A F0vqwJeyCWyTWH0zOS.p9TuRUDL_E7gi5qnQMRxsaLxS8hOH.zXwv1jARoJt NTUZYxs20ckH1w3ecqIu9wOXIiNLUtEyBHnDBvdv.1591jcQMH3GPt3zOe69 IencCIVZDrhBqaROu5M.TBDDkYCuCgnLCI06_xoniNz00EartVgpGrfFGsbJ 8tg-- Received: from [66.54.159.166] by web162402.mail.bf1.yahoo.com via HTTP; Wed, 07 Nov 2012 13:18:24 PST X-Rocket-MIMEInfo: 001.001,R290IGl0LiBUaGFua3MKCgpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwogRnJvbTogQnJvY2sgTm9sYW5kIDxicm9ja0BjbG91ZGVyYS5jb20.ClRvOiB1c2VyQGZsdW1lLmFwYWNoZS5vcmc7IFJhaHVsIFJhdmluZHJhbiA8cmFodWxydkB5YWhvby5jb20.IApTZW50OiBXZWRuZXNkYXksIE5vdmVtYmVyIDcsIDIwMTIgMTI6MTQgUE0KU3ViamVjdDogUmU6IEd1YXJhbnRlZXMgb2YgdGhlIG1lbW9yeSBjaGFubmVsIGZvciBkZWxpdmVyaW5nIHRvIHNpbmsKIAoKVGhlIG1lbW9yeSBjaGFubmVsIGQBMAEBAQE- X-Mailer: YahooMailWebService/0.8.123.460 References: <1352237525.85801.YahooMailNeo@web162401.mail.bf1.yahoo.com> <1352238192.34154.YahooMailNeo@web162403.mail.bf1.yahoo.com> <1352242427.20289.YahooMailNeo@web162406.mail.bf1.yahoo.com> <1352245233.88566.YahooMailNeo@web162401.mail.bf1.yahoo.com> <1352316569.89418.YahooMailNeo@web162401.mail.bf1.yahoo.com> <1352317957.60238.YahooMailNeo@web162405.mail.bf1.yahoo.com> Message-ID: <1352323104.55678.YahooMailNeo@web162402.mail.bf1.yahoo.com> Date: Wed, 7 Nov 2012 13:18:24 -0800 (PST) From: Rahul Ravindran Reply-To: Rahul Ravindran Subject: Re: Guarantees of the memory channel for delivering to sink To: "user@flume.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-853603208-1676298641-1352323104=:55678" X-Virus-Checked: Checked by ClamAV on apache.org ---853603208-1676298641-1352323104=:55678 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Got it. Thanks=0A=0A=0A________________________________=0A From: Brock Nola= nd =0ATo: user@flume.apache.org; Rahul Ravindran =0ASent: Wednesday, November 7, 2012 12:14 PM=0ASubject: Re: = Guarantees of the memory channel for delivering to sink=0A =0A=0AThe memory= channel doesn't know about networks. =A0The sources like avrosource/avrosi= nk do. They operate on TCP/IP and when there is an error sending data downs= tream they roll the transaction back so that no data is lost. The believe t= he docs cover this here=A0http://flume.apache.org/FlumeUserGuide.html=0A=0A= Brock=0A=0A=0AOn Wed, Nov 7, 2012 at 1:52 PM, Rahul Ravindran wrote:=0A=0AHi,=0A>=0A>=0A>Thanks for the response.=0A>=0A>=0A>Does = the memory channel provide transactional guarantees? In the event of a netw= ork packet loss, does it retry sending the packet? If we ensure that we do = not exceed the capacity for the memory channel, does it continue retrying t= o send an event to the remote source on failure?=0A>=0A>=0A>Thanks,=0A>~Rah= ul.=0A>=0A>=0A>=0A>________________________________=0A> From: Brock Noland = =0A>To: user@flume.apache.org; Rahul Ravindran =0A>Sent: Wednesday, November 7, 2012 11:48 AM=0A>=0A>Subject: = Re: Guarantees of the memory channel for delivering to sink=0A> =0A>=0A>=0A= >Hi,=0A>=0A>=0A>Yes if you use memory channel, you can lose data. To not lo= se data, file channel needs to write to disk...=0A>=0A>=0A>Brock=0A>=0A>=0A= >On Wed, Nov 7, 2012 at 1:29 PM, Rahul Ravindran wrote:= =0A>=0A>Ping on the below questions about new Spool Directory source:=0A>>= =0A>>=0A>>If we choose to use the memory channel with this source, to an Av= ro sink on a remote box, do we risk data loss in the eventuality of a netwo= rk partition/slow network or if the flume-agent on the source box dies?=0A>= >If we choose to use file channel with this source, we will result in doubl= e writes to disk, correct? (one for the legacy log files which will be inge= sted by the Spool Directory source, and the other for the WAL)=0A>>=0A>>=0A= >>=0A>>=0A>>=0A>>________________________________=0A>> From: Rahul Ravindra= n =0A>>To: "user@flume.apache.org" =0A>>Sent: Tuesday, November 6, 2012 3:40 PM=0A>>=0A>>Subject: Re: Guara= ntees of the memory channel for delivering to sink=0A>> =0A>>=0A>>=0A>>This= is awesome.=A0=0A>>This may be perfect for our use case :)=0A>>=0A>>=0A>>W= hen is the 1.3 release expected?=0A>>=0A>>=0A>>Couple of questions for the = choice of channel for the new source:=0A>>=0A>>=0A>>If we choose to use the= memory channel with this source, to an Avro sink on a remote box, do we ri= sk data loss in the eventuality of a network partition/slow network or if t= he flume-agent on the source box dies?=0A>>If we choose to use file channel= with this source, we will result in double writes to disk, correct? (one f= or the legacy log files which will be ingested by the Spool Directory sourc= e, and the other for the WAL)=0A>>=0A>>=0A>>Thanks,=0A>>~Rahul.=0A>>=0A>>= =0A>>=0A>>________________________________=0A>> From: Brock Noland =0A>>To: user@flume.apache.org; Rahul Ravindran =0A>>Sent: Tuesday, November 6, 2012 3:05 PM=0A>>Subject: Re: Guarante= es of the memory channel for delivering to sink=0A>> =0A>>This use case sou= nds like a perfect use of the Spool DIrectory source=0A>>which will be in t= he upcoming 1.3 release.=0A>>=0A>>Brock=0A>>=0A>>On Tue, Nov 6, 2012 at 4:5= 3 PM, Rahul Ravindran wrote:=0A>>> We will update the c= heckpoint each time=0A (we may tune this to be=0A periodic)=0A>>> but the c= ontents of the memory channel will be in the legacy logs which are=0A>>> cu= rrently being generated.=0A>>>=0A>>> Additionally, the sink for the memory = channel will be an Avro source in=0A>>> another machine.=0A>>>=0A>>> Does t= hat clear things up?=0A>>>=0A>>> ________________________________=0A>>> Fro= m: Brock Noland =0A>>> To: user@flume.apache.org; Rahul= Ravindran =0A>>> Sent: Tuesday, November 6, 2012 1:44 P= M=0A>>>=0A>>> Subject: Re: Guarantees of the memory channel for delivering = to sink=0A>>>=0A>>> But in your architecture you=0A are going to write the = contents of the=0A>>> memory channel out? Or did I miss=0A something?=0A>>>= =0A>>> "The checkpoint will be updated each time we perform a successive=0A= >>> insertion into the memory channel."=0A>>>=0A>>> On Tue, Nov 6, 2012 at = 3:43 PM, Rahul Ravindran wrote:=0A>>>> We have a legacy= system which writes events to a file (existing log file).=0A>>>> This will= continue. If I used a filechannel, I will be double the number=0A>>>> of= =0A>>>> IO operations(writes to the legacy log file, and writes to WAL).=0A= >>>>=0A>>>> ________________________________=0A>>>> From: Brock Noland =0A>>>> To: user@flume.apache.org; Rahul Ravindran =0A>>>> Sent: Tuesday, November 6, 2012 1:38 PM=0A>>>> Subject: = Re: Guarantees of the memory channel for delivering to sink=0A>>>>=0A>>>> Y= our still going to be writing out all events, no? So how would file=0A>>>> = channel do more IO than that?=0A>>>>=0A>>>> On Tue, Nov 6, 2012 at 3:32 PM,= Rahul Ravindran wrote:=0A>>>>> Hi,=0A>>>>>=A0 =A0 I am= very new to Flume and we are hoping to use it for our log=0A>>>>> aggregat= ion into HDFS. I have a few questions below:=0A>>>>>=0A>>>>> FileChannel wi= ll double our disk IO, which will affect IO=0A performance on=0A>>>>> certa= in performance sensitive machines. Hence, I was hoping to write a=0A>>>>> c= ustom Flume source which will use a memory channel, and which=0A will=0A>>>= >> perform=0A>>>>> checkpointing. The checkpoint will be updated each time = we perform a=0A>>>>> successive insertion into the memory channel. (I reali= ze that this=0A>>>>> results=0A>>>>> in a risk of data, the maximum size of= which is the capacity of the=0A>>>>> memory=0A>>>>> channel).=0A>>>>>=0A>>= >>>=A0 =A0 As long as there is capacity in the memory channel buffers, does= the=0A>>>>> memory channel guarantee delivery to a sink (does it wait for= =0A>>>>> acknowledgements, and retry failed packets)? This would mean that = we need=0A>>>>> to=0A>>>>> ensure that we do not exceed the channel capacit= y.=0A>>>>>=0A>>>>> I am writing a custom source which will use the memory c= hannel, and which=0A>>>>> will catch a ChannelException to identify any cha= nnel capacity issues(so,=0A>>>>> buffer used in the memory channel=0A is fu= ll because of lagging=0A>>>>> sinks/network=0A>>>>> issues etc). Is that a = reasonable assumption to make?=0A>>>>>=0A>>>>> Thanks,=0A>>>>> ~Rahul.=0A>>= >>=0A>>>>=0A>>>>=0A>>>> --=0A>>>> Apache MRUnit - Unit testing MapReduce -= =0A>>>> http://incubator.apache.org/mrunit/=0A>>>>=0A>>>>=0A>>>=0A>>>=0A>>>= =0A>>> --=0A>>> Apache MRUnit - Unit testing MapReduce - http://incubator.a= pache.org/mrunit/=0A>>>=0A>>>=0A>>=0A>>=0A>>=0A>>-- =0A>>Apache MRUnit - Un= it testing MapReduce - http://incubator.apache.org/mrunit/=0A>>=0A>>=0A>>= =0A>>=0A>>=0A>=0A>=0A>=0A>-- =0A>Apache MRUnit - Unit testing MapReduce - h= ttp://incubator.apache.org/mrunit/=0A>=0A>=0A>=0A=0A=0A-- =0AApache MRUnit = - Unit testing MapReduce - http://incubator.apache.org/mrunit/ ---853603208-1676298641-1352323104=:55678 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
Got it. Th= anks


From: Brock Noland <brock@cloudera.com= >
To: user@flume.ap= ache.org; Rahul Ravindran <rahulrv@yahoo.com>
Sent: Wednesday, November 7, 2012 12:14 PM Subject: Re: Guarantees= of the memory channel for delivering to sink

The memory channel doesn't know about networks.  Th= e sources like avrosource/avrosink do. They operate on TCP/IP and when there is an error = sending data downstream they roll the transaction back so that no data is l= ost. The believe the docs cover this here http://flu= me.apache.org/FlumeUserGuide.html
=0A=0A
Brock
On Wed, Nov 7, 2012 at 1:52 PM, Rah= ul Ravindran <rahulr= v@yahoo.com> wrote:
=0A=0A
Hi,
=0A=0A=
Thanks for the response.
=0A=0A<= span>
Does the memory channel provide transactiona= l guarantees? In the event of a=0A network packet loss, does it retry sendi= ng the packet? If we ensure that we do not exceed the capacity for the memo= ry channel, does it continue retrying to send an event to the remote source= on failure?
=0A=0A
Thanks,
=0A=0A~Rahul.

=0A=0A

= From: Brock Noland <brock@cloudera.com>
To: user@flume.apache.org; Rahul Ravindran <rahulrv@yahoo.com>
=0A=0A
Sent: Wednesday, November 7, 2012 11:48 AM
=

Sub= ject: Re: Guarantees of the memory channel for delivering to sin= k
=0A=0A
<= br>
Hi,

Yes if you use memory channel, you can lose = data. To not lose data, file channel needs to write to disk...
Brock
=0A=0A
On Wed, Nov 7, 2012 at 1:29 PM, Rahul = Ravindran <rahulrv@y= ahoo.com> wrote:
=0A=0A
Ping on the below questions about new Spool Directory= source:
=0A=0A=0A=0A

=0A=0A<= span>
If we choose to use= the memory channel with this source, to an Avro sink on a remote box, do w= e risk data loss in the eventuality of a network partition/slow network or = if the flume-agent on the source box dies?
=0A=0A=0A=0A
If we=0A choose to use file channel wi= th this source, we will result in double writes to disk, correct? (one for = the legacy log files which will be ingested by the Spool Directory source, = and the other for the WAL)

=0A=0A=0A=0A

=
=0A=0A

From:= Rahul Ravindran <rahulrv@yahoo.co= m>
=0A=0A
=0A=0ATo: "user@flume.apache.org" <user@flume.apache.org= >
=0A=0A=0A=0A Sent:= Tuesday, November 6, 2012 3:40 PM

Subject: Re: Guarantees of the memory channel for del= ivering to sink
=0A=0A

This is awesome. 
=0A=0AThis may be perfect for our use case :)
<= span>
=0A=0A
When is the 1.3 release expected?
=0A=0A
Couple of questions for th= e choice of channel for the new source:
=0A=0A

=0A=0AIf we choose to use the memory channel with this so= urce, to an Avro sink on a remote box, do we risk data loss in the eventual= ity of a network partition/slow network or if the flume-agent on the source= box dies?
=0A=0A=0A=0A
If we choose to use file chan= nel with this source, we will result in double writes to disk, correct? (on= e for the legacy log files which will be ingested by the Spool Directory so= urce, and the other for the WAL)
=0A=0A=0A=0A

=0A=0AThanks,
~Rahul.

=0A=0A

From: Brock Noland <brock@cloudera.com>
=0A=0A=0A=0A To: user@flume.apache.org; Rahul Ravindran <rahulrv@yahoo.com>
=0A=0A=0A=0A Sent: Tuesday, November 6, 2012 3:05 PM Subject: Re: Guarantees = of the memory channel for delivering to sink
=0A=0A=0A=0A=
This use case sounds like a perfect use of the Spool DIrectory source<= br>which will be in the upcoming 1.3 release.

Brock

On Tue, N= ov 6, 2012 at 4:53 PM, Rahul Ravindran <rahulrv@yahoo.com> wrote:
=0A=0A=0A=0A> We will update the c= heckpoint each time=0A (we may tune this to be=0A periodic)
> but the= contents of the memory channel will be in the legacy logs which are
>= ; currently being generated.
>
> Additionally, the sink for the= memory channel will be an Avro source in
=0A=0A=0A=0A> another machi= ne.
>
> Does that clear things up?
>
> ____________= ____________________
> From: Brock Noland <brock@cloudera.com>
=0A=0A=0A=0A> To: user@flume.apache.org; Rahul Ravindran &= lt;rahulrv@yahoo.com>
=0A=0A=0A= =0A> Sent: Tuesday, November 6, 2012 1:44 PM
>
> Subject: Re= : Guarantees of the memory channel for delivering to sink
>
> B= ut in your architecture you=0A are going to write the contents of the
&g= t; memory channel out? Or did I miss=0A something?
>
> "The che= ckpoint will be updated each time we perform a successive
> insertion= into the memory channel."
>
> On Tue, Nov 6, 2012 at 3:43 PM, = Rahul Ravindran <rahulrv@yahoo.com= > wrote:
=0A=0A=0A=0A>> We have a legacy system which writes ev= ents to a file (existing log file).
>> This will continue. If I us= ed a filechannel, I will be double the number
>> of
>> IO= operations(writes to the legacy log file, and writes to WAL).
=0A=0A=0A= =0A>>
>> ________________________________
>> From: = Brock Noland <brock@cloudera.com= >
>> To: user@flume.a= pache.org; Rahul Ravindran <rahulr= v@yahoo.com>
=0A=0A=0A=0A>> Sent: Tuesday, November 6, 2012= 1:38 PM
>> Subject: Re: Guarantees of the memory channel for deli= vering to sink
>>
>> Your still going to be writing out a= ll events, no? So how would file
=0A=0A=0A=0A>> channel do more IO= than that?
>>
>> On Tue, Nov 6, 2012 at 3:32 PM, Rahul R= avindran <rahulrv@yahoo.com> wr= ote:
>>> Hi,
=0A=0A=0A=0A>>>    I am very= new to Flume and we are hoping to use it for our log
>>> aggre= gation into HDFS. I have a few questions below:
>>>
>>= > FileChannel will double our disk IO, which will affect IO=0A performan= ce on
>>> certain performance sensitive machines. Hence, I was = hoping to write a
>>> custom Flume source which will use a memo= ry channel, and which=0A will
>>> perform
>>> check= pointing. The checkpoint will be updated each time we perform a
>>= > successive insertion into the memory channel. (I realize that this
= >>> results
=0A=0A=0A=0A>>> in a risk of data, the max= imum size of which is the capacity of the
>>> memory
>>= ;> channel).
>>>
>>>    As long as the= re is capacity in the memory channel buffers, does the
=0A=0A=0A=0A>&= gt;> memory channel guarantee delivery to a sink (does it wait for
&g= t;>> acknowledgements, and retry failed packets)? This would mean tha= t we need
>>> to
>>> ensure that we do not exceed t= he channel capacity.
=0A=0A=0A=0A>>>
>>> I am writi= ng a custom source which will use the memory channel, and which
>>= > will catch a ChannelException to identify any channel capacity issues(= so,
>>> buffer used in the memory channel=0A is full because of= lagging
>>> sinks/network
>>> issues etc). Is that= a reasonable assumption to make?
>>>
>>> Thanks,>>> ~Rahul.
>>
>>
>>
=0A=0A=0A=0A&= gt;> --
>> Apache MRUnit - Unit testing MapReduce -
>>= http://incubator.apache.org/mrunit/
>>
>>=0A=0A=0A=0A>
>
>
> --
> Apache MRUnit - Unit t= esting MapReduce - http://incubator.apache.org/mrunit/
><= br>>

=0A=0A=0A=0A

--
Apache MRUnit - Unit testing MapR= educe - http://incubator.apache.org/mrunit/


=

=0A=0A=0A=0A
<= /div>



--
A= pache MRUnit - Unit testing MapReduce - http://incubator.apache.org= /mrunit/
=0A=0A=0A=0A=0A
=0A


<= /div>


= --
Apache MRUnit - Unit testing MapReduce - http://incubator.a= pache.org/mrunit/
=0A=0A=0A=0A

---853603208-1676298641-1352323104=:55678--