chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Using Chuckwa for Nutch Log Analysis and Monitoring
Date Sat, 14 Feb 2015 20:13:13 GMT
Hi Eric,
Thank you for the feedback.
This is more than helpful.
I am going to write a Gora module for Chuckwa.
I am going to progress on basis of implementing log monitor for  Nutch.
Can Chuckwa currently write to file and email response?
Thanks
Lewis

[0] http://gora.apache.org

On Sat, Feb 14, 2015 at 9:30 AM, Eric Yang <eric818@gmail.com> wrote:

> Hi Lewis,
>
> Parse error can be captured and store errors to another HDFS location.  In
> Chukwa 0.4 and earlier, we have demux map reduce job which does the
> extraction and store structured data in HDFS, and errors are channel to
> another HDFS folder called InError, with the cause of the parsing error.
> This is still a batch oriented operation.  In Chukwa 0.6, we can setup
> multiple pipeline writer.  The pipeline writers can be configured to
> provide parsing and channel error to somewhere else, if data parse
> properly, then write it to HBase or HDFS.  However, you will need to write
> the pipeline writer class to extend this functionality.  We currently only
> have a couple pipeline writers, LocalWriter, HBaseWriter, and
> SeqFileWriter.  SeqFileWriter needs to be the last one in the pipeline, if
> you choose to write data to HDFS.  See this page for how to configure
> pipeline writer to achieve partially of what you are looking for:
>
> http://chukwa.apache.org/docs/r0.6.0/pipeline.html
>
> Hope this helps.
>
> regards,
> Eric
>
> On Thu, Feb 12, 2015 at 11:12 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
>> Hi Folks,
>> For some time I have been meaning to get in touch to get advice on
>> developing a tool for log analysis of Apache Nutch [0] logs.
>> What I am referring to particularly is monitoring of logs in a bid to
>> identify particular errors which we may anticipate.
>> Nutch jobs are batch oriented in architecture which are inherited from
>> Hadoop, we typically see errors in the parse phase of a crawl so it is
>> events like this that I would like to anticipate, monitor and report on,
>> possibly through email.
>> So I am therefore thinking about building a Chuckwa-powered tool for
>> Nutch which would become part of our codebase.
>> Is Chukwa the right tool for this? Any information about similar efforts
>> would be very much appreciated.
>> best
>> Lewis
>>
>> [0] http://nutch.apache.org
>>
>> --
>> *Lewis*
>>
>
>


-- 
*Lewis*

Mime
View raw message