flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Re: Flow in Flume, could it make better?
Date Tue, 19 Aug 2014 08:56:09 GMT
Would it be possible to link the interceptors to the channels?? I didn't
find anything about it in the documentation, I guess not.

I guess that another possiblity it's to execute the interceptors in the
Sink, what If i'm right means to implement specific Sinks or is it possible?


2014-08-19 9:11 GMT+02:00 Guillermo Ortiz <konstt2000@gmail.com>:

> Yeah, I think that it's what I'm doing.
> How about:
>
>                                                   channel1 -> sink1 (hdfs
> raw data)
> Agent 1src --> replicate +
> Interceptor1
> -->sink3
>                                                    channel2 --> sink2 avro
> --> agent2 src Avro --> multiplexing + interceptor2
>
> -->sink4
>
> Could it be possible to apply the interceptor1 just for channel1?? I know
> that interceptors apply to source level. Interceptor1 doesn't modify too
> much the data,
> I could feed channel2 with those little transformations but ideally I
> would like it. So, if I want to do it, it looks like I'd have to create
> another level with more channels, etc, etc... Something like this:
>
>                                    channel1 -> *sink1 avro -> scr1 avro +
> interceptor1 -> channel -> sink1 (hdfs raw data)*
> Agent 1src -->
> replicate
>                           -->sink3
>                                    channel2 --> sink2 avro --> agent2 src
> Avro --> multiplexing + interceptor2
>
> -->sink4
>
> The point is that in sink4 my flow continues and I have other structure
> that it's similiar that all the previously, So, that means 8 channels in
> total. I don't know if it's possible to simplify this.
>
>
> 2014-08-19 0:09 GMT+02:00 terrey shih <terreyshih@gmail.com>:
>
> something like this
>>
>>                         channel 1 -> sink 1 (raw event sink)
>> agent 1src -> replicate
>>
>>                                                                          ->
>> sink 3
>>                         channel 2 - sink  2 -> agent 2 src -> multiplexer
>>
>> -> sink 4
>>
>> In fact, I tried not having agent 2, but directly connecting sink2 to src
>> 2, I was not able to do due to RPCClient exception.
>>
>> I am just going to try to have 2 agents.
>>
>> terrey
>>
>>
>> On Mon, Aug 18, 2014 at 3:06 PM, terrey shih <terreyshih@gmail.com>
>> wrote:
>>
>>> Well, I am actually doing similar things as you do.  I also need to feed
>>> that data to different sinks, one just raw data and the other ones are
>>> Hbase sinks using the multiplexer.
>>>
>>>
>>>                         channel 1 -> sink 1 (raw event sink)
>>> agent 1src -> replicate
>>> channel 2 - sink  2 -> agent 2 src -> multiplexer
>>>
>>>                         channel 2 - sink  2 -> agent 2 src -> multiplexer
>>>
>>>
>>>
>>>
>>> On Mon, Aug 18, 2014 at 1:35 PM, Guillermo Ortiz <konstt2000@gmail.com>
>>> wrote:
>>>
>>>> On my test, everything is in the same VM. Later, I'll have another flow
>>>> which is just spooling or tailing a file and send through Avro to another
>>>> Source on my system.
>>>>
>>>> Do I really need to do that replicating step? I think that I have too
>>>> many channel and that means too resources and too configuration.
>>>>
>>>>
>>>> 2014-08-18 19:51 GMT+02:00 terrey shih <terreyshih@gmail.com>:
>>>>
>>>> Hi,
>>>>>
>>>>> Your 2 sources (spooling) and source Avro (from sink 2) are in two
>>>>> different JVMs/machines ?
>>>>>
>>>>> thx
>>>>>
>>>>>
>>>>> On Mon, Aug 18, 2014 at 9:53 AM, Guillermo Ortiz <konstt2000@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I have build a flow with Flume and I don't know if it's the way to
do
>>>>>> it, or there is something better. I am spooling a directory and need
those
>>>>>> data in three different paths in HDFS with different formats, so
I have
>>>>>> created two interceptors.
>>>>>>
>>>>>> Source(Spooling) + Replication + Interceptor1 --> to C1 and C2
>>>>>> C1 -> Sink1 to HDFS Path1 (It's like a historic)
>>>>>> C2 --> Sink2 to Avro --> Source Avro + Multiplexing + Interceptor2
>>>>>> --> C3 and C4
>>>>>> C3 --> Sink3 to HDFS Path2
>>>>>> C4 --> Sink4 to HDFS Path3
>>>>>>
>>>>>> Interceptor1 doesn't make too much with the data, it's just to save
>>>>>> as they are, it's like to store an history of the original data.
>>>>>>
>>>>>> Interceptor2 configure an selector and a header. It processes the
>>>>>> data and configure the selector to redirect to Sink3 or Sink4. But
this
>>>>>> interceptor change the original data.
>>>>>>
>>>>>> I tried to do all the process without replicating data, but I could
>>>>>> not. Now, it seems like too many steps just because I want to store
the
>>>>>> original data in HDFS like a historic.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message