Return-Path: X-Original-To: apmail-chukwa-user-archive@www.apache.org Delivered-To: apmail-chukwa-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89F7C1780F for ; Sat, 14 Feb 2015 17:46:11 +0000 (UTC) Received: (qmail 5844 invoked by uid 500); 14 Feb 2015 17:46:03 -0000 Delivered-To: apmail-chukwa-user-archive@chukwa.apache.org Received: (qmail 5190 invoked by uid 500); 14 Feb 2015 17:46:03 -0000 Mailing-List: contact user-help@chukwa.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@chukwa.apache.org Delivered-To: mailing list user@chukwa.apache.org Received: (qmail 5135 invoked by uid 99); 14 Feb 2015 17:46:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Feb 2015 17:46:02 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eric818@gmail.com designates 209.85.214.173 as permitted sender) Received: from [209.85.214.173] (HELO mail-ob0-f173.google.com) (209.85.214.173) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 Feb 2015 17:31:46 +0000 Received: by mail-ob0-f173.google.com with SMTP id uy5so30358056obc.4 for ; Sat, 14 Feb 2015 09:30:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=fdwzrSOPbRjs0b58roPV3OGbbUiYxVXDI8FxoNy4WAQ=; b=xjL60eYSF4zUV38QOQHbTk+c6DGIf5gCnC3gVTZ1/V5471Hv8584Mjz20qaf3owqAK KTO3QhE21hhlAfDuIKisdakriv0DQ4XrLaCxmINKZMWqLaPdlyisw46Jo0VSShUkcTAa rttHtmubiTdxpVod2whCJzi1xFfBt5UwpUB+AOSnP6mKVFDyGyrPttzkdMikLKWkBuqU pACyQUvje1jp2kQH62TjKu5u+7/TMLWIjKOyJr9J0S8TEWISVluBn9hzOFuaH/mQ39BH PJ7P9YgwH7g3nVa/XWvVM7UBS7tTbBw5Etjv+ilhWB0HyaBjiIAWaD9ulATCkJvCSVCl sWHQ== MIME-Version: 1.0 X-Received: by 10.60.63.97 with SMTP id f1mr10374564oes.16.1423935014998; Sat, 14 Feb 2015 09:30:14 -0800 (PST) Received: by 10.202.96.197 with HTTP; Sat, 14 Feb 2015 09:30:14 -0800 (PST) In-Reply-To: References: Date: Sat, 14 Feb 2015 09:30:14 -0800 Message-ID: Subject: Re: Using Chuckwa for Nutch Log Analysis and Monitoring From: Eric Yang To: user@chukwa.apache.org Content-Type: multipart/alternative; boundary=001a11c21066c097ed050f0fb425 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c21066c097ed050f0fb425 Content-Type: text/plain; charset=UTF-8 Hi Lewis, Parse error can be captured and store errors to another HDFS location. In Chukwa 0.4 and earlier, we have demux map reduce job which does the extraction and store structured data in HDFS, and errors are channel to another HDFS folder called InError, with the cause of the parsing error. This is still a batch oriented operation. In Chukwa 0.6, we can setup multiple pipeline writer. The pipeline writers can be configured to provide parsing and channel error to somewhere else, if data parse properly, then write it to HBase or HDFS. However, you will need to write the pipeline writer class to extend this functionality. We currently only have a couple pipeline writers, LocalWriter, HBaseWriter, and SeqFileWriter. SeqFileWriter needs to be the last one in the pipeline, if you choose to write data to HDFS. See this page for how to configure pipeline writer to achieve partially of what you are looking for: http://chukwa.apache.org/docs/r0.6.0/pipeline.html Hope this helps. regards, Eric On Thu, Feb 12, 2015 at 11:12 PM, Lewis John Mcgibbney < lewis.mcgibbney@gmail.com> wrote: > Hi Folks, > For some time I have been meaning to get in touch to get advice on > developing a tool for log analysis of Apache Nutch [0] logs. > What I am referring to particularly is monitoring of logs in a bid to > identify particular errors which we may anticipate. > Nutch jobs are batch oriented in architecture which are inherited from > Hadoop, we typically see errors in the parse phase of a crawl so it is > events like this that I would like to anticipate, monitor and report on, > possibly through email. > So I am therefore thinking about building a Chuckwa-powered tool for Nutch > which would become part of our codebase. > Is Chukwa the right tool for this? Any information about similar efforts > would be very much appreciated. > best > Lewis > > [0] http://nutch.apache.org > > -- > *Lewis* > --001a11c21066c097ed050f0fb425 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Lewis,

Parse error can be captured a= nd store errors to another HDFS location.=C2=A0 In Chukwa 0.4 and earlier, = we have demux map reduce job which does the extraction and store structured= data in HDFS, and errors are channel to another HDFS folder called InError= , with the cause of the parsing error.=C2=A0 This is still a batch oriented= operation.=C2=A0 In Chukwa 0.6, we can setup multiple pipeline writer.=C2= =A0 The pipeline writers can be configured to provide parsing and channel e= rror to somewhere else, if data parse properly, then write it to HBase or H= DFS.=C2=A0 However, you will need to write the pipeline writer class to ext= end this functionality.=C2=A0 We currently only have a couple pipeline writ= ers, LocalWriter, HBaseWriter, and SeqFileWriter.=C2=A0 SeqFileWriter needs= to be the last one in the pipeline, if you choose to write data to HDFS.= =C2=A0 See this page for how to configure pipeline writer to achieve partia= lly of what you are looking for:


Hope this helps.

regards,
Eric

On Thu, Feb 12, 2015 at 11:12 PM, L= ewis John Mcgibbney <lewis.mcgibbney@gmail.com> wrot= e:
Hi Folks,
For some time I have been meaning to get in touch to get advice on = developing a tool for log analysis of Apache Nutch [0] logs.
What I am r= eferring to particularly is monitoring of logs in a bid to identify particu= lar errors which we may anticipate.
Nutch jobs are batch oriented in arc= hitecture which are inherited from Hadoop, we typically see errors in the p= arse phase of a crawl so it is events like this that I would like to antici= pate, monitor and report on, possibly through email.
So I am = therefore thinking about building a Chuckwa-powered tool for Nutch which wo= uld become part of our codebase.
Is Chukwa the right tool for= this? Any information about similar efforts would be very much appreciated= .
best
Lewis

[0] http://nutch.apache.org
<= br>--
Lewis

--001a11c21066c097ed050f0fb425--