chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <>
Subject Re: Adaptor options for Hadoop log files (TaskTracker, DataNode)
Date Mon, 18 May 2009 05:50:04 GMT
Hi Jiaqi,

Chukwa has a special log4j appender which escapes return character.  The
multi-lines exception will be stored as a single chunk, and processed as a
single chukwa record after Demux.

You are on the right track.  For your purpose, you may want to create your
own custom datatype and having a matched chukwa log4j appender record type
to process the data that you are looking for.  To start, you may be
interested in modifying HadoopLogProcessor, and enhance from there.  Chukwa
is currently streaming all hadoop logs in the same record type (HadoopLog),
and this part could use some help to carve out the definitions.  See you


On 5/17/09 6:42 PM, "Jiaqi Tan" <> wrote:

> Hi Ariel,
> So with the CharFileTailingAdaptorUTF8NewLineEscaped, if I have a log
> file entry with a multi-line entry, e.g. if there was a Java exception
> logged, would each line be separated into a different chunk? If that's
> the case, are there any adaptors that would coalesce multi-line log
> entries into a single chunk?
> Also, does the data type get resolved by Demux to one of the classes
> in org.apache.hadoop.chukwa.extraction.demux.processor.mapper? i.e. if
> I wanted to implement my own custom datatype, I should create a Demux
> processor and stick it in as one of the classes in that package?
> Thanks,
> Jiaqi
> On Sun, May 17, 2009 at 6:19 PM, Ariel Rabkin <> wrote:
>> It's worth distinguishing two different things.
>> The adaptor (as in CharFileTailingAdaptorUTF8) is responsible for
>> deciding how to break the data into chunks, and how to tag the chunks.
>>  Probably CharFileTailingAdaptorUTF8NewLineEscaped is right for you.
>> (We should really rename that to something shorter!)
>> The type, like SysLog or NameNodeLog, is stored by the adaptor, and
>> passed through as Chunk metadata. It's used to tell the Demux how to
>> process that data.  The demux-conf has the mapping from datatype to
>> processor.  For logs, you should be fine just picking a datatype.  If
>> you aren't using Demux to process the logs, you don't even need to
>> write a processor.
>> --Ari
>> On Sun, May 17, 2009 at 6:15 PM, Jiaqi Tan <> wrote:
>>> Hi,
>>> Which adaptor should I use if I want to process log entries from the
>>> TaskTracker and DataNode logs? Should I just use one of the
>>> FileTailer adaptors already available (CharFileTailingAdaptorUTF8), or
>>> is there a custom type such as the one for SysLog or NameNodeLog when
>>> using the CharFileTailingAdaptorUTF8NewLineEscaped adaptor?
>>> Is there any documentation available on what the "type" (e.g. SysLog
>>> or NameNodeLog) means and how to use it/how it works?
>>> Thanks,
>>> Jiaqi
>> --
>> Ari Rabkin
>> UC Berkeley Computer Science Department

View raw message