flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ed Judge <ejud...@gmail.com>
Subject Re: Avro source and sink
Date Mon, 08 Sep 2014 19:00:29 GMT
Thanks for the reply.  My understanding of the current avro sink/source is that the schema
is just a simple header/body schema.  What I ultimately want to do is read a log file and
write it into an avro file on a HDFS at sink B.  However, before that I would like to parse
the log file at source A first and come up with a more detailed schema which might include
for example date, priority, subsystem, and log message fields.  I am trying to “normalize’
the events so that we could also potentially filter certain log file fields at the source.
 Does this make sense?

On Sep 6, 2014, at 11:52 AM, Ashish <paliwalashish@gmail.com> wrote:

> I am not sure I understand the question correctly, let me try to answer based on my understanding
> 
> source A -> channel A -> sink A ———> source B -> channel B -> sink
B
> 
> For the scenario, Sink A has to be an Avro sink and Source B has to be an Avro Source
for the flow to work.
> Flume would use avro for RPC (look at flume-ng-sdk/src/main/avro/flume.avdl). It defines
how Flume would send Event(s) across using Avro RPC. 
> 
> 1. Source A (spooled dir) would read files and create Events from it and insert into
channel
> 2. Sink A (Avro sink) would read from Channel, and would translate Event into AvroFlumeEvent
for sending to Source B (Avro Source)
> 3. Source B would read from AvroFlumeEvent and create an Event and insert into channel,
which shall be processed by Sink B
> 
> It's not that event would be wrapped into events as it traverse down the chain. Avro
encoding would just exist between Sink A and Source B. 
> 
> Based on my understanding, you are looking at encoding log file lines using avro. In
that case, the avro encoded log file lines would be part of Event body, rest would be same
as Step 1-3
> 
> HTH !
> 
> 
> On Fri, Sep 5, 2014 at 12:58 AM, Ed Judge <ejudgie@gmail.com> wrote:
> Ok, I have looked over the source and it is making a little more sense.
> 
> I think what I ultimately want to do is this:
> 
> source A -> channel A -> sink A ———> source B -> channel B -> sink
B
> 
> source A will be looking at a log file.  Each line of the log file will have a certain
format/schema. I would write Source A such that it could write the schema/line as an event
into the channel and pass that through the system all the way ultimately to sink B so that
it would know the schema also.  
> I was thinking Avro would be a good format for source A to use when writing into it’s
channel.  If Sink A is an existing Avro Sink and Source B is an exiting Avro source, would
this still work?  Does this mean I would have 2 Avro headers (one encapsulating the other)
which wasteful or can the existing Avro source and sink deal with this unmodified?  Is there
a better way to accomplish what I want to do?  Just looking for some guidance.
> 
> Thanks,
> Ed
> 
> On Sep 4, 2014, at 4:44 AM, Ashish <paliwalashish@gmail.com> wrote:
> 
>> Avro records shall have the schema embedded with them. Have a look at source, that
shall help a bit
>> 
>> 
>> On Wed, Sep 3, 2014 at 10:30 PM, Ed Judge <ejudgie@gmail.com> wrote:
>> That’s helpful but isn’t there some type of Avro schema negotiation that occurs?
>> 
>> -Ed
>> 
>> On Sep 3, 2014, at 12:02 AM, Jeff Lord <jlord@cloudera.com> wrote:
>> 
>>> Ed,
>>> 
>>> Did you take a look at the javadoc in the source?
>>> Basically the source uses netty as a server and the sink is just an rpc client.
>>> If you read over the doc which is in the two links below and take a look at the
developer guide and still have questions just ask away and someone will help to answer.
>>> 
>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/source/AvroSource.java
>>> 
>>> https://github.com/apache/flume/blob/trunk/flume-ng-core/src/main/java/org/apache/flume/sink/AvroSink.java
>>> 
>>> https://flume.apache.org/FlumeDeveloperGuide.html#transaction-interface
>>> 
>>> -Jeff
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Sep 2, 2014 at 6:36 PM, Ed Judge <ejudgie@gmail.com> wrote:
>>> Does anyone know of any good documentation that talks about the protocol/negotiation
used between an Avro source and sink?
>>> 
>>> Thanks,
>>> Ed
>>> 
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> thanks
>> ashish
>> 
>> Blog: http://www.ashishpaliwal.com/blog
>> My Photo Galleries: http://www.pbase.com/ashishpaliwal
> 
> 
> 
> 
> -- 
> thanks
> ashish
> 
> Blog: http://www.ashishpaliwal.com/blog
> My Photo Galleries: http://www.pbase.com/ashishpaliwal


Mime
View raw message