Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 95033 invoked from network); 28 Mar 2007 22:52:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Mar 2007 22:52:47 -0000 Received: (qmail 48823 invoked by uid 500); 28 Mar 2007 22:52:53 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 48799 invoked by uid 500); 28 Mar 2007 22:52:53 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 48790 invoked by uid 99); 28 Mar 2007 22:52:53 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Mar 2007 15:52:53 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Mar 2007 15:52:45 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 4AB2071406C for ; Wed, 28 Mar 2007 15:52:25 -0700 (PDT) Message-ID: <32791489.1175122345302.JavaMail.jira@brutus> Date: Wed, 28 Mar 2007 15:52:25 -0700 (PDT) From: "stack@archive.org (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-1181) userlogs reader In-Reply-To: <5503064.1175121685138.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack@archive.org updated HADOOP-1181: -------------------------------------- Attachment: hadoop1181.patch Attached is a patch that changes TaskLog$Reader so it uses URLs instead of the file system. It also: + Adds a constructor that takes a userlog subdirectory URL. + Adds a public getInputStream method that streams over all userlog parts. + Makes TaskLog and TaskLog$Reader public rather than default access + Adds a main that takes a URL and that then prints to stdout the concatenated logs I'll not mark this issue as 'patch ready' until others have had a gander. Would be great if Arun C Murthy could review since he wrote the original. In particular, it would be nice to know if the calculation of totalLogSize in the TaskLog$Reader#fetchAll method -- around line 384 in r523437 -- is important. If not, then some near-duplicate code could be replaced with call to the new getInputStream in a version2 of this patch. > userlogs reader > --------------- > > Key: HADOOP-1181 > URL: https://issues.apache.org/jira/browse/HADOOP-1181 > Project: Hadoop > Issue Type: Improvement > Reporter: stack@archive.org > Attachments: hadoop1181.patch > > > My jobs output lots of logging. I want to be able to quickly parse the logs across the cluster for anomalies. org.apache.hadoop.tool.Logalyzer looks promising at first but it does not know how to deal with the userlog format and it wants to first copy all logs local. Digging, there does not seem to currently be a reader for hadoop userlog format. TaskLog$Reader is not generally accessible and it too expects logs to be on the local filesystem (The latter is of little good if I want to run the analysis as a mapreduce job). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.