chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiaqi Tan <>
Subject Re: Adaptor options for Hadoop log files (TaskTracker, DataNode)
Date Mon, 18 May 2009 14:14:28 GMT
Hi Eric,

> Chukwa has a special log4j appender which escapes return character.  The
> multi-lines exception will be stored as a single chunk, and processed as a
> single chukwa record after Demux.

In this case, I suppose I would need to configure the monitored Hadoop
cluster to actually use the Chukwa log4j appender? Would I also need
to recompile the Hadoop of the monitored cluster to include the Chukwa
code then?

> You are on the right track.  For your purpose, you may want to create your
> own custom datatype and having a matched chukwa log4j appender record type
> to process the data that you are looking for.  To start, you may be
> interested in modifying HadoopLogProcessor, and enhance from there.  Chukwa
> is currently streaming all hadoop logs in the same record type (HadoopLog),
> and this part could use some help to carve out the definitions.

Where are these record types defined, and how do they map the the
processors? Is it a direct <record type name>Processor mapping that's
automatically done by the Demux?


> On 5/17/09 6:42 PM, "Jiaqi Tan" <> wrote:
>> Hi Ariel,
>> So with the CharFileTailingAdaptorUTF8NewLineEscaped, if I have a log
>> file entry with a multi-line entry, e.g. if there was a Java exception
>> logged, would each line be separated into a different chunk? If that's
>> the case, are there any adaptors that would coalesce multi-line log
>> entries into a single chunk?
>> Also, does the data type get resolved by Demux to one of the classes
>> in org.apache.hadoop.chukwa.extraction.demux.processor.mapper? i.e. if
>> I wanted to implement my own custom datatype, I should create a Demux
>> processor and stick it in as one of the classes in that package?
>> Thanks,
>> Jiaqi
>> On Sun, May 17, 2009 at 6:19 PM, Ariel Rabkin <> wrote:
>>> It's worth distinguishing two different things.
>>> The adaptor (as in CharFileTailingAdaptorUTF8) is responsible for
>>> deciding how to break the data into chunks, and how to tag the chunks.
>>>  Probably CharFileTailingAdaptorUTF8NewLineEscaped is right for you.
>>> (We should really rename that to something shorter!)
>>> The type, like SysLog or NameNodeLog, is stored by the adaptor, and
>>> passed through as Chunk metadata. It's used to tell the Demux how to
>>> process that data.  The demux-conf has the mapping from datatype to
>>> processor.  For logs, you should be fine just picking a datatype.  If
>>> you aren't using Demux to process the logs, you don't even need to
>>> write a processor.
>>> --Ari
>>> On Sun, May 17, 2009 at 6:15 PM, Jiaqi Tan <> wrote:
>>>> Hi,
>>>> Which adaptor should I use if I want to process log entries from the
>>>> TaskTracker and DataNode logs? Should I just use one of the
>>>> FileTailer adaptors already available (CharFileTailingAdaptorUTF8), or
>>>> is there a custom type such as the one for SysLog or NameNodeLog when
>>>> using the CharFileTailingAdaptorUTF8NewLineEscaped adaptor?
>>>> Is there any documentation available on what the "type" (e.g. SysLog
>>>> or NameNodeLog) means and how to use it/how it works?
>>>> Thanks,
>>>> Jiaqi
>>> --
>>> Ari Rabkin
>>> UC Berkeley Computer Science Department

View raw message