Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 67834 invoked from network); 28 Apr 2009 00:57:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Apr 2009 00:57:52 -0000 Received: (qmail 40625 invoked by uid 500); 28 Apr 2009 00:57:51 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 40575 invoked by uid 500); 28 Apr 2009 00:57:51 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 40565 invoked by uid 99); 28 Apr 2009 00:57:51 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Apr 2009 00:57:51 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Apr 2009 00:57:51 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D0D55234C004 for ; Mon, 27 Apr 2009 17:57:30 -0700 (PDT) Message-ID: <1837627121.1240880250854.JavaMail.jira@brutus> Date: Mon, 27 Apr 2009 17:57:30 -0700 (PDT) From: "Jakob Homan (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-5752) Provide examples of using offline image viewer (oiv) to analyze hadoop file systems In-Reply-To: <458471969.1240879770453.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HADOOP-5752: -------------------------------- Attachment: HADOOP-5752.patch The OIV's output data are ripe for analysis. The attached patch: * Creates a new image processor, Delimited, that creates a (by default) tab-delimited file of the namespace that is suitable for analysis by other tools. * Updates the the oiv documentation to provide examples of how to analyze these files using Pig to find probable duplicate files, files that have never been accessed and the total number of files of each user in the namespace. These are meant as examples to help ops and such build other useful scripts. * Provides unit test for new DelimitedImageVisitor Right now the script files themselves are not included in the patch because I couldn't figure out a good place to stash them in the file structure. Konstantin suggested adding them to the wiki, which would be nice as other users could add other scripts as they are created, but I don't see where the wiki hosts files like these. If it can, can someone please point me to them? Santhosh from the Pig team kindly reviewed and blessed the pig scripts. > Provide examples of using offline image viewer (oiv) to analyze hadoop file systems > ----------------------------------------------------------------------------------- > > Key: HADOOP-5752 > URL: https://issues.apache.org/jira/browse/HADOOP-5752 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Reporter: Jakob Homan > Assignee: Jakob Homan > Attachments: HADOOP-5752.patch > > > The offline image viewer provides the ability to generate large amounts of data about an hdfs namespace. It would be good to provide tools, examples, etc. on how to analyze this data to find useful information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.