chukwa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jiaqi Tan <>
Subject Re: Adaptor options for Hadoop log files (TaskTracker, DataNode)
Date Mon, 18 May 2009 01:42:08 GMT
Hi Ariel,

So with the CharFileTailingAdaptorUTF8NewLineEscaped, if I have a log
file entry with a multi-line entry, e.g. if there was a Java exception
logged, would each line be separated into a different chunk? If that's
the case, are there any adaptors that would coalesce multi-line log
entries into a single chunk?

Also, does the data type get resolved by Demux to one of the classes
in org.apache.hadoop.chukwa.extraction.demux.processor.mapper? i.e. if
I wanted to implement my own custom datatype, I should create a Demux
processor and stick it in as one of the classes in that package?


On Sun, May 17, 2009 at 6:19 PM, Ariel Rabkin <> wrote:
> It's worth distinguishing two different things.
> The adaptor (as in CharFileTailingAdaptorUTF8) is responsible for
> deciding how to break the data into chunks, and how to tag the chunks.
>  Probably CharFileTailingAdaptorUTF8NewLineEscaped is right for you.
> (We should really rename that to something shorter!)
> The type, like SysLog or NameNodeLog, is stored by the adaptor, and
> passed through as Chunk metadata. It's used to tell the Demux how to
> process that data.  The demux-conf has the mapping from datatype to
> processor.  For logs, you should be fine just picking a datatype.  If
> you aren't using Demux to process the logs, you don't even need to
> write a processor.
> --Ari
> On Sun, May 17, 2009 at 6:15 PM, Jiaqi Tan <> wrote:
>> Hi,
>> Which adaptor should I use if I want to process log entries from the
>> TaskTracker and DataNode logs? Should I just use one of the
>> FileTailer adaptors already available (CharFileTailingAdaptorUTF8), or
>> is there a custom type such as the one for SysLog or NameNodeLog when
>> using the CharFileTailingAdaptorUTF8NewLineEscaped adaptor?
>> Is there any documentation available on what the "type" (e.g. SysLog
>> or NameNodeLog) means and how to use it/how it works?
>> Thanks,
>> Jiaqi
> --
> Ari Rabkin
> UC Berkeley Computer Science Department

View raw message