Mailing-List: contact user-help@flume.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flume.apache.org
Received-SPF: pass (athena.apache.org: domain of konstt2000@gmail.com
 designates 74.125.82.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAEkjam3NV61F-CAqP1h6-GM0=RDzPoWLRt7BMTAViDQMADoDUw@mail.gmail.com>
References: 
 <CAE6kwsPTooWNeYaDqpoQUhi3y8sZgwB9it-LwUOfZ83Yi0CJiA@mail.gmail.com>
	<CAEkjam3NV61F-CAqP1h6-GM0=RDzPoWLRt7BMTAViDQMADoDUw@mail.gmail.com>
Date: Mon, 18 Aug 2014 22:35:03 +0200
Message-ID: 
 <CAE6kwsOwrvw68oG8BUC7OHku3x4KAXCtpiNKA=sSWvbteaXfmg@mail.gmail.com>
Subject: Re: Flow in Flume, could it make better?
From: Guillermo Ortiz <konstt2000@gmail.com>
To: user@flume.apache.org
Content-Type: multipart/alternative; boundary=001a11c261d04229330500ed4e99

--001a11c261d04229330500ed4e99
Content-Type: text/plain; charset=UTF-8

On my test, everything is in the same VM. Later, I'll have another flow
which is just spooling or tailing a file and send through Avro to another
Source on my system.

Do I really need to do that replicating step? I think that I have too many
channel and that means too resources and too configuration.


2014-08-18 19:51 GMT+02:00 terrey shih <terreyshih@gmail.com>:

> Hi,
>
> Your 2 sources (spooling) and source Avro (from sink 2) are in two
> different JVMs/machines ?
>
> thx
>
>
> On Mon, Aug 18, 2014 at 9:53 AM, Guillermo Ortiz <konstt2000@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have build a flow with Flume and I don't know if it's the way to do it,
>> or there is something better. I am spooling a directory and need those data
>> in three different paths in HDFS with different formats, so I have created
>> two interceptors.
>>
>> Source(Spooling) + Replication + Interceptor1 --> to C1 and C2
>> C1 -> Sink1 to HDFS Path1 (It's like a historic)
>> C2 --> Sink2 to Avro --> Source Avro + Multiplexing + Interceptor2 --> C3
>> and C4
>> C3 --> Sink3 to HDFS Path2
>> C4 --> Sink4 to HDFS Path3
>>
>> Interceptor1 doesn't make too much with the data, it's just to save as
>> they are, it's like to store an history of the original data.
>>
>> Interceptor2 configure an selector and a header. It processes the data
>> and configure the selector to redirect to Sink3 or Sink4. But this
>> interceptor change the original data.
>>
>> I tried to do all the process without replicating data, but I could not.
>> Now, it seems like too many steps just because I want to store the original
>> data in HDFS like a historic.
>>
>
>

--001a11c261d04229330500ed4e99
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">On my test, everything is in the same VM. Later, I&#39;ll =
have another flow which is just spooling or tailing a file and send through=
 Avro to another Source on my system.<div><br></div><div>Do I really need t=
o do that replicating step? I think that I have too many channel and that m=
eans too resources and too configuration.=C2=A0</div>
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">2014-08=
-18 19:51 GMT+02:00 terrey shih <span dir=3D"ltr">&lt;<a href=3D"mailto:ter=
reyshih@gmail.com" target=3D"_blank">terreyshih@gmail.com</a>&gt;</span>:<b=
r><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex">
<div dir=3D"ltr"><div>Hi,<br><br>Your 2 sources (spooling) and source Avro =
(from sink 2) are in two different JVMs/machines ?<br><br></div>thx<br></di=
v><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><b=
r><div class=3D"gmail_quote">
On Mon, Aug 18, 2014 at 9:53 AM, Guillermo Ortiz <span dir=3D"ltr">&lt;<a h=
ref=3D"mailto:konstt2000@gmail.com" target=3D"_blank">konstt2000@gmail.com<=
/a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi,=C2=A0<div><br></div><di=
v>I have build a flow with Flume and I don&#39;t know if it&#39;s the way t=
o do it, or there is something better. I am spooling a directory and need t=
hose data in three different paths in HDFS with different formats, so I hav=
e created two interceptors.</div>


<div><br></div><div>Source(Spooling) + Replication + Interceptor1 --&gt; to=
 C1 and C2</div><div>C1 -&gt; Sink1 to HDFS Path1 (It&#39;s like a historic=
)</div><div>C2 --&gt; Sink2 to Avro --&gt; Source Avro + Multiplexing + Int=
erceptor2 --&gt; C3 and C4</div>


<div>C3 --&gt; Sink3 to HDFS Path2</div><div>C4 --&gt; Sink4 to HDFS Path3<=
/div><div><br></div><div>Interceptor1 doesn&#39;t make too much with the da=
ta, it&#39;s just to save as they are, it&#39;s like to store an history of=
 the original data.</div>


<div><br></div><div>Interceptor2 configure an selector and a header. It pro=
cesses the data and configure the selector to redirect to Sink3 or Sink4. B=
ut this interceptor change the original data.</div><div><br></div><div>


I tried to do all the process without replicating data, but I could not. No=
w, it seems like too many steps just because I want to store the original d=
ata in HDFS like a historic.</div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a11c261d04229330500ed4e99--