Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 13007 invoked from network); 8 Jul 2009 22:27:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 8 Jul 2009 22:27:21 -0000 Received: (qmail 63686 invoked by uid 500); 8 Jul 2009 22:27:29 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 63596 invoked by uid 500); 8 Jul 2009 22:27:29 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 63586 invoked by uid 500); 8 Jul 2009 22:27:29 -0000 Delivered-To: apmail-hadoop-core-user@hadoop.apache.org Received: (qmail 63583 invoked by uid 99); 8 Jul 2009 22:27:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jul 2009 22:27:29 +0000 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of usmanw@opera.com designates 213.236.208.81 as permitted sender) Received: from [213.236.208.81] (HELO smtp.opera.com) (213.236.208.81) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jul 2009 22:27:17 +0000 Received: from dell-pc.oslo.opera.com (pat-tdc.opera.com [213.236.208.22]) (authenticated bits=0) by smtp.opera.com (8.13.4/8.13.4/Debian-3sarge3) with ESMTP id n68MQtJL015324 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 8 Jul 2009 22:26:56 GMT Date: Thu, 09 Jul 2009 00:26:44 +0200 To: "core-user@hadoop.apache.org" Subject: Extracting data from HDFS and displaying stats to a webpage From: "Usman Waheed" Organization: Opera Software Content-Type: text/plain; format=flowed; delsp=yes; charset=iso-8859-15 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Message-ID: User-Agent: Opera Mail/9.64 (Win32) X-Virus-Checked: Checked by ClamAV on apache.org Hi All, Is there a recommended way on how to extract data from HDFS and perform some computations on the data in order to display the results on a webpage. One thing that comes to my mind is to write simple CGI perl scripts that extract the data from HDFS and perform computational work on the data before sending the results to the browser. or Maybe run some scripts in the background that summarize the data in HDFS and insert into a DB table. Can then write a web GUI that interacts with the DB table and displays the desired stats with graphs using ploticus. Our data set in HDFS will eventually grow so speed will be important. Thanks, Usman -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/