flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish <paliwalash...@gmail.com>
Subject Re: Avro source and sink
Date Sat, 06 Sep 2014 15:52:19 GMT
I am not sure I understand the question correctly, let me try to answer
based on my understanding

source A -> channel A -> sink A ———> source B -> channel B -> sink B

For the scenario, Sink A has to be an Avro sink and Source B has to be an
Avro Source for the flow to work.
Flume would use avro for RPC (look at
flume-ng-sdk/src/main/avro/flume.avdl). It defines how Flume would send
Event(s) across using Avro RPC.

1. Source A (spooled dir) would read files and create Events from it and
insert into channel
2. Sink A (Avro sink) would read from Channel, and would translate Event
into AvroFlumeEvent for sending to Source B (Avro Source)
3. Source B would read from AvroFlumeEvent and create an Event and insert
into channel, which shall be processed by Sink B

It's not that event would be wrapped into events as it traverse down the
chain. Avro encoding would just exist between Sink A and Source B.

Based on my understanding, you are looking at encoding log file lines using
avro. In that case, the avro encoded log file lines would be part of Event
body, rest would be same as Step 1-3

HTH !


On Fri, Sep 5, 2014 at 12:58 AM, Ed Judge <ejudgie@gmail.com> wrote:

> Ok, I have looked over the source and it is making a little more sense.
>
> I think what I ultimately want to do is this:
>
> source A -> channel A -> sink A ———> source B -> channel B -> sink
B
>
> source A will be looking at a log file.  Each line of the log file will
> have a certain format/schema. I would write Source A such that it could
> write the schema/line as an event into the channel and pass that through
> the system all the way ultimately to sink B so that it would know the
> schema also.
> I was thinking Avro would be a good format for source A to use when
> writing into it’s channel.  If Sink A is an existing Avro Sink and Source B
> is an exiting Avro source, would this still work?  Does this mean I would
> have 2 Avro headers (one encapsulating the other) which wasteful or can the
> existing Avro source and sink deal with this unmodified?  Is there a better
> way to accomplish what I want to do?  Just looking for some guidance.
>
> Thanks,
> Ed
>
> On Sep 4, 2014, at 4:44 AM, Ashish <paliwalashish@gmail.com> wrote:
>
> Avro records shall have the schema embedded with them. Have a look at
> source, that shall help a bit
>
>
> On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ejudgie@gmail.com> wrote:
>
>> That’s helpful but isn’t there some type of Avro schema negotiation that
>> occurs?
>>
>> -Ed
>>
>> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jlord@cloudera.com> wrote:
>>
>> Ed,
>>
>> Did you take a look at the javadoc in the source?
>> Basically the source uses netty as a server and the sink is just an rpc
>> client.
>> If you read over the doc which is in the two links below and take a look
>> at the developer guide and still have questions just ask away and someone
>> will help to answer.
>>
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
>>
>>
>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
>>
>> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
>>
>> -Jeff
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ejudgie@gmail.com> wrote:
>>
>>> Does anyone know of any good documentation that talks about the
>>> protocol/negotiation used between an Avro source and sink?
>>>
>>> Thanks,
>>> Ed
>>>
>>>
>>
>>
>
>
> --
> thanks
> ashish
>
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal
>
>
>


-- 
thanks
ashish

Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal

Mime
View raw message