flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From terrey shih <terreys...@gmail.com>
Subject Re: Flow in Flume, could it make better?
Date Mon, 18 Aug 2014 22:09:56 GMT
something like this

                        channel 1 -> sink 1 (raw event sink)
agent 1src -> replicate

                                                                         ->
sink 3
                        channel 2 - sink  2 -> agent 2 src -> multiplexer

-> sink 4

In fact, I tried not having agent 2, but directly connecting sink2 to src
2, I was not able to do due to RPCClient exception.

I am just going to try to have 2 agents.

terrey


On Mon, Aug 18, 2014 at 3:06 PM, terrey shih <terreyshih@gmail.com> wrote:

> Well, I am actually doing similar things as you do.  I also need to feed
> that data to different sinks, one just raw data and the other ones are
> Hbase sinks using the multiplexer.
>
>
>                         channel 1 -> sink 1 (raw event sink)
> agent 1src -> replicate
> channel 2 - sink  2 -> agent 2 src -> multiplexer
>
>                         channel 2 - sink  2 -> agent 2 src -> multiplexer
>
>
>
>
> On Mon, Aug 18, 2014 at 1:35 PM, Guillermo Ortiz <konstt2000@gmail.com>
> wrote:
>
>> On my test, everything is in the same VM. Later, I'll have another flow
>> which is just spooling or tailing a file and send through Avro to another
>> Source on my system.
>>
>> Do I really need to do that replicating step? I think that I have too
>> many channel and that means too resources and too configuration.
>>
>>
>> 2014-08-18 19:51 GMT+02:00 terrey shih <terreyshih@gmail.com>:
>>
>> Hi,
>>>
>>> Your 2 sources (spooling) and source Avro (from sink 2) are in two
>>> different JVMs/machines ?
>>>
>>> thx
>>>
>>>
>>> On Mon, Aug 18, 2014 at 9:53 AM, Guillermo Ortiz <konstt2000@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have build a flow with Flume and I don't know if it's the way to do
>>>> it, or there is something better. I am spooling a directory and need those
>>>> data in three different paths in HDFS with different formats, so I have
>>>> created two interceptors.
>>>>
>>>> Source(Spooling) + Replication + Interceptor1 --> to C1 and C2
>>>> C1 -> Sink1 to HDFS Path1 (It's like a historic)
>>>> C2 --> Sink2 to Avro --> Source Avro + Multiplexing + Interceptor2
-->
>>>> C3 and C4
>>>> C3 --> Sink3 to HDFS Path2
>>>> C4 --> Sink4 to HDFS Path3
>>>>
>>>> Interceptor1 doesn't make too much with the data, it's just to save as
>>>> they are, it's like to store an history of the original data.
>>>>
>>>> Interceptor2 configure an selector and a header. It processes the data
>>>> and configure the selector to redirect to Sink3 or Sink4. But this
>>>> interceptor change the original data.
>>>>
>>>> I tried to do all the process without replicating data, but I could
>>>> not. Now, it seems like too many steps just because I want to store the
>>>> original data in HDFS like a historic.
>>>>
>>>
>>>
>>
>

Mime
View raw message