chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lewis John Mcgibbney <lewis.mcgibb...@gmail.com>
Subject Re: Using Chuckwa for Nutch Log Analysis and Monitoring
Date Sat, 14 Feb 2015 21:31:58 GMT
https://issues.apache.org/jira/browse/CHUKWA-734

On Sat, Feb 14, 2015 at 12:13 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Eric,
> Thank you for the feedback.
> This is more than helpful.
> I am going to write a Gora module for Chuckwa.
> I am going to progress on basis of implementing log monitor for  Nutch.
> Can Chuckwa currently write to file and email response?
> Thanks
> Lewis
>
> [0] http://gora.apache.org
>
> On Sat, Feb 14, 2015 at 9:30 AM, Eric Yang <eric818@gmail.com> wrote:
>
>> Hi Lewis,
>>
>> Parse error can be captured and store errors to another HDFS location.
>> In Chukwa 0.4 and earlier, we have demux map reduce job which does the
>> extraction and store structured data in HDFS, and errors are channel to
>> another HDFS folder called InError, with the cause of the parsing error.
>> This is still a batch oriented operation.  In Chukwa 0.6, we can setup
>> multiple pipeline writer.  The pipeline writers can be configured to
>> provide parsing and channel error to somewhere else, if data parse
>> properly, then write it to HBase or HDFS.  However, you will need to write
>> the pipeline writer class to extend this functionality.  We currently only
>> have a couple pipeline writers, LocalWriter, HBaseWriter, and
>> SeqFileWriter.  SeqFileWriter needs to be the last one in the pipeline, if
>> you choose to write data to HDFS.  See this page for how to configure
>> pipeline writer to achieve partially of what you are looking for:
>>
>> http://chukwa.apache.org/docs/r0.6.0/pipeline.html
>>
>> Hope this helps.
>>
>> regards,
>> Eric
>>
>> On Thu, Feb 12, 2015 at 11:12 PM, Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com> wrote:
>>
>>> Hi Folks,
>>> For some time I have been meaning to get in touch to get advice on
>>> developing a tool for log analysis of Apache Nutch [0] logs.
>>> What I am referring to particularly is monitoring of logs in a bid to
>>> identify particular errors which we may anticipate.
>>> Nutch jobs are batch oriented in architecture which are inherited from
>>> Hadoop, we typically see errors in the parse phase of a crawl so it is
>>> events like this that I would like to anticipate, monitor and report on,
>>> possibly through email.
>>> So I am therefore thinking about building a Chuckwa-powered tool for
>>> Nutch which would become part of our codebase.
>>> Is Chukwa the right tool for this? Any information about similar efforts
>>> would be very much appreciated.
>>> best
>>> Lewis
>>>
>>> [0] http://nutch.apache.org
>>>
>>> --
>>> *Lewis*
>>>
>>
>>
>
>
> --
> *Lewis*
>



-- 
*Lewis*

Mime
View raw message