flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joey Echeverria <j...@cloudera.com>
Subject Re: Flume log4j-appender
Date Mon, 22 Sep 2014 16:41:50 GMT
Hi Hanish,

The Log4jAppender is designed to connect to a Flume agent running an
AvroSource. So, you'd configure Flume similar to [1] and then point
the Log4jAppender to your agent using the log4j properties you linked

The Log4jAppender will use Avro inspect the object being logged to
determine it's schema and to serialize it to bytes which becomes the
body of the events sent to Flume. If you're logging Strings, which is
most common, then the Schema will just be a Schema.String. There are
two ways that schema information can be passed. You can configure the
Log4jAppender with a Schema URL that will be sent in the event headers
or you can leave that out and a JSON-reperesentation of the Schema
will be sent as a header with each event. The URL is more efficient as
it avoids sending extra information with each record, but you can
leave it out to start your testing.

With regards to your second question, the answer is no. Flume does not
attempt to re-order events so your logs will appear in arrival order.
What I would do, is write the data to a partitioned directory
structure and then have a Crunch job that sorts each partition as it

You might consider taking a look at the Kite SDK[2] as we have some
examples that show how to do the logging[3] and can also handle
getting the data properly partitioned on HDFS.



[1] http://flume.apache.org/FlumeUserGuide.html#avro-source
[2] http://kitesdk.org/docs/current/
[3] https://github.com/kite-sdk/kite-examples/tree/snapshot/logging

On Mon, Sep 22, 2014 at 4:21 AM, Hanish Bansal
<hanish.bansal.agarwal@gmail.com> wrote:
> Hi All,
> I want to use flume log4j-appender for logging of a map-reduce application
> which is running on different nodes. My use case is have the logs from all
> nodes at centralized location(say HDFS) with time based synchronization.
> As described in below links Flume has its own appender which can be used to
> logging of an application in HDFS direct from log4j.
> http://www.addsimplicity.com/adding_simplicity_an_engi/2010/10/sending-logs-down-the-flume.html
> http://flume.apache.org/FlumeUserGuide.html#log4j-appender
> Could anyone please tell me if these logs are synchronized on time-basis or
> not ?
> What i mean by time based synchronization is: Having logs from different
> nodes in sorted order of time.
> Also could anyone provide a link that describes how flume log4j-appender
> internally works?
> --
> Thanks & Regards
> Hanish Bansal

Joey Echeverria

View raw message