flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <dwight.marz...@here.com>
Subject RE: Why does a Flume source need to recognize the format of the message?
Date Tue, 22 Oct 2013 21:33:19 GMT
So, from what I am gathering from the discussion below is that the Scribe source doesn't do
the parsing or splitting of data.  It just takes in the data flow as is and passes it onto
the sink.  The right sink splits the Scribe data up based on the category.    That is a good
clarification for me as I saw it the other way around.

Having never worked with Thrift or Avro could you give me a sample entry for a flume config
file for one of these that would parse data with a scribe category that is coming in via the
Scribe source?


From: ext Roshan Naik [mailto:roshan@hortonworks.com]
Sent: Tuesday, October 22, 2013 5:21 PM
To: user@flume.apache.org
Subject: Re: Why does a Flume source need to recognize the format of the message?

i forgot to note that syslog source also does some parsing.

On Tue, Oct 22, 2013 at 1:51 PM, Roshan Naik <roshan@hortonworks.com<mailto:roshan@hortonworks.com>>
At a minimum it needs to know how to split incoming data into individual events. Typically
a newline is used as the separator.

 Avro & thrift are special purpose sources/sinks which handle headers and body. Avro,
Thrift & HTTP sources will parse the incoming data into header + body. AFAIKT most other
sources treat the whole thing as a body. They should not need any more info other than line/event

You can write custom deserializer which is supported by some sources to parse custom incoming
data format.


On Tue, Oct 22, 2013 at 11:07 AM, Jarek Jarcec Cecho <jarcec@apache.org<mailto:jarcec@apache.org>>
Hi Praveen,
I think that there is a confusion between message and payload. Whereas Flume do not need to
understand the payload structure, it do need to understand the message to understand what
events (what payloads) are there with what headers. To put it differently, Flume do not need
to understand structure of the data that you are sending (payload is just a byte array for
Flume), but that unknown structure needs to be transferred via known protocol (such as AVRO


On Tue, Oct 22, 2013 at 06:59:17PM +0100, Praveen Sripati wrote:
> According to the Flume documentation
> >>    A Flume source consumes events delivered to it by an external source
> like a web server. The external source sends events to Flume in a format
> that is recognized by the target Flume source. For example, an Avro Flume
> source can be used to receive Avro events from Avro clients or other Flume
> agents in the flow that send events from an Avro sink.
> Why does a Flume source need to recognize or understand the format of the
> message? While all it does it does is to forward the message to one of the
> channel.
> Thanks,
> Praveen

NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.

View raw message