chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <>
Subject Re: Adaptor options for Hadoop log files (TaskTracker, DataNode)
Date Mon, 18 May 2009 16:38:41 GMT

On 5/18/09 7:14 AM, "Jiaqi Tan" <> wrote:

> Hi Eric,
>> Chukwa has a special log4j appender which escapes return character.  The
>> multi-lines exception will be stored as a single chunk, and processed as a
>> single chukwa record after Demux.
> In this case, I suppose I would need to configure the monitored Hadoop
> cluster to actually use the Chukwa log4j appender? Would I also need
> to recompile the Hadoop of the monitored cluster to include the Chukwa
> code then?

There is no need to recompile Hadoop.  Chukwa contains a jar file called
chukwa-hadoop-*-client.jar, and json.jar.  Drop those two jar files in lib
directory of the hadoop cluster, and configure in hadoop
conf directory.

> Where are these record types defined, and how do they map the the
> processors? Is it a direct <record type name>Processor mapping that's
> automatically done by the Demux?

Record types are defined in the  For example, hadoop has a
appender called DRFA, and the chukwa enabled appender would look like:


The association of HadoopLog record type and the demux class is in


>> On 5/17/09 6:42 PM, "Jiaqi Tan" <> wrote:
>>> Hi Ariel,
>>> So with the CharFileTailingAdaptorUTF8NewLineEscaped, if I have a log
>>> file entry with a multi-line entry, e.g. if there was a Java exception
>>> logged, would each line be separated into a different chunk? If that's
>>> the case, are there any adaptors that would coalesce multi-line log
>>> entries into a single chunk?
>>> Also, does the data type get resolved by Demux to one of the classes
>>> in org.apache.hadoop.chukwa.extraction.demux.processor.mapper? i.e. if
>>> I wanted to implement my own custom datatype, I should create a Demux
>>> processor and stick it in as one of the classes in that package?
>>> Thanks,
>>> Jiaqi
>>> On Sun, May 17, 2009 at 6:19 PM, Ariel Rabkin <> wrote:
>>>> It's worth distinguishing two different things.
>>>> The adaptor (as in CharFileTailingAdaptorUTF8) is responsible for
>>>> deciding how to break the data into chunks, and how to tag the chunks.
>>>>  Probably CharFileTailingAdaptorUTF8NewLineEscaped is right for you.
>>>> (We should really rename that to something shorter!)
>>>> The type, like SysLog or NameNodeLog, is stored by the adaptor, and
>>>> passed through as Chunk metadata. It's used to tell the Demux how to
>>>> process that data.  The demux-conf has the mapping from datatype to
>>>> processor.  For logs, you should be fine just picking a datatype.  If
>>>> you aren't using Demux to process the logs, you don't even need to
>>>> write a processor.
>>>> --Ari
>>>> On Sun, May 17, 2009 at 6:15 PM, Jiaqi Tan <> wrote:
>>>>> Hi,
>>>>> Which adaptor should I use if I want to process log entries from the
>>>>> TaskTracker and DataNode logs? Should I just use one of the
>>>>> FileTailer adaptors already available (CharFileTailingAdaptorUTF8), or
>>>>> is there a custom type such as the one for SysLog or NameNodeLog when
>>>>> using the CharFileTailingAdaptorUTF8NewLineEscaped adaptor?
>>>>> Is there any documentation available on what the "type" (e.g. SysLog
>>>>> or NameNodeLog) means and how to use it/how it works?
>>>>> Thanks,
>>>>> Jiaqi
>>>> --
>>>> Ari Rabkin
>>>> UC Berkeley Computer Science Department

View raw message