flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Israel Ekpo <isr...@aicer.org>
Subject Re: "single source - multi channel" scenario and applying interceptor while writing to only one channel and not on others...possible approaches
Date Tue, 23 Apr 2013 14:15:31 GMT
Connor,

This is a great example.

Thank you for sharing this. It was an excellent tutorial.

I will create a JIRA issue to document this workaround in the user guide.




On 23 April 2013 02:52, Connor Woodson <cwoodson.dev@gmail.com> wrote:

> Some more thoughts on this:
>
> The way Interceptors are currently set to work is that they apply to an
> event as it is received. There are good uses for this - for instances, it
> allows easily configuring a single Timestamp interceptor that gives all
> events a source receives a timestamp, so even if you have multiple
> sinks/channels responding to an event, you only have that one interceptor.
> Interceptors in this sense serve to add data to event headers, and as such
> it makes sense to have them applied only once by the source instead of
> letting the channels change header data.
>
> If you wish to use an interceptor in the above way, to modify header data,
> and still want that interceptor to apply for a single channel, then if you
> don't mind could you elaborate on what you are trying to do? I haven't been
> able to come up with a situation like that. The solution here would be to
> do as Jeff suggested and use a serializer; if you want more in-depth
> instructions on how to build it, please ask; I have a set of directions
> lying around somewhere that I'll find for you.
>
>
> However, the way Interceptors work I have myself faced a situation where I
> would like the interceptors to be channel only. This use case is when I
> want to use an Interceptor to filter events; I want to send an event to
> some subset of channels based on the contents of its data. Here is how you
> can do this in the current setup (where Interceptors are applied at the
> source instead of per-channel):
>
> Using the Multiplexing Channel Selector you are able to choose which
> channels an event is written to based off of the value of a specified
> header (documentation in that link). There are some more features to the
> selector that aren't documented, called Optional Channels or something, but
> I don't know very much about them - just figured I would point out that
> they exist; digging through the source should provide some more insight.
>
> So here is how you want to set your system up. Create an Interceptor that
> will define a certain header value based off of the event's contents. For
> instance, if you want all events containing exactly 1 character to be sent
> to a channel, you could create an Interceptor that counts the characters in
> the event. Then that Interceptor will set a certain header value to
> "SINGLE" if there is just one character, or "MULTIPLE" if there are more.
>
> Then you can create your channel selector like this (modified from the
> documentation example):
>
> a1.sources = r1
> a1.channels = all_events single_events multiple_events
> a1.sources.r1.interceptors = your_interceptor
> a1.sources.r1.interceptors.your_interceptor.header = header
> a1.sources.r1.selector.type = multiplexing
> a1.sources.r1.selector.header = header
> a1.sources.r1.selector.mapping.SINGLE = all_events single_events
> a1.sources.r1.selector.mapping.MULTIPLE = all_events multiple_events
> a1.sources.r1.selector.default = all_events
>
>
> The result is that now you have created a way to filter which channels a
> certain event is sent to. Note that a channel can appear more than once -
> for instance, all_events will get all events. And so the trick is to just
> define the right interceptor (which are much simpler to code than a
> serializer (which itself is fairly easy)).
>
> Hopefully that was clear. Feel free to ask more questions,
>
> - Connor
>
>
>
> On Fri, Apr 19, 2013 at 11:14 AM, Jeff Lord <jlord@cloudera.com> wrote:
>
>> Jagadish,
>>
>> Here is an example of how to write a custom serializer.
>>
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/test/java/org/apache/flume/sink/hdfs/MyCustomSerializer.java
>>
>> -Jeff
>>
>>
>> On Fri, Apr 19, 2013 at 9:34 AM, Jeff Lord <jlord@cloudera.com> wrote:
>>
>>> Hi Jagadish,
>>>
>>> Have you considered using a custom event serializer to modify your event?
>>> Its possible to replicate your flow using two channels and then have one
>>> sink that implements a custom serializer to modify the event.
>>>
>>> -Jeff
>>>
>>>
>>> On Tue, Apr 16, 2013 at 11:12 PM, Jagadish Bihani <
>>> jagadish.bihani@pubmatic.com> wrote:
>>>
>>>> Hi
>>>>
>>>> If anybody has any inputs on this that will surely help.
>>>>
>>>> Regards,
>>>> Jagadish
>>>>
>>>>
>>>> On 04/16/2013 12:06 PM, Jagadish Bihani wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> We have a use case in which
>>>>> 1. spooling source reads data.
>>>>> 2. It needs to write events into multiple channels. It should apply
>>>>> interceptor only when putting into one channel and should put
>>>>> the event as it is while putting into another channel.
>>>>>
>>>>> Possible approach we have thought:
>>>>>
>>>>> 1. Create  2 different sources and then apply interceptor on one and
>>>>> dont
>>>>> apply on other. But that duplicates reads and increases IO.
>>>>>
>>>>> Is there any better way of achieving this use case?
>>>>>
>>>>> Regards,
>>>>> Jagadish
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message