Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 42864 invoked from network); 6 Apr 2010 22:16:36 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Apr 2010 22:16:36 -0000 Received: (qmail 64870 invoked by uid 500); 6 Apr 2010 22:16:36 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 64844 invoked by uid 500); 6 Apr 2010 22:16:36 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 64836 invoked by uid 99); 6 Apr 2010 22:16:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Apr 2010 22:16:36 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jgray@facebook.com designates 69.63.179.25 as permitted sender) Received: from [69.63.179.25] (HELO mailout-sf2p.facebook.com) (69.63.179.25) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Apr 2010 22:16:28 +0000 Received: from mail.thefacebook.com ([192.168.18.198]) by pp02.snc1.tfbnw.net (8.14.3/8.14.3) with ESMTP id o36MFVmD009371 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT) for ; Tue, 6 Apr 2010 15:15:33 -0700 Received: from sc-hub06.TheFacebook.com (192.168.18.83) by sc-hub03.TheFacebook.com (192.168.18.198) with Microsoft SMTP Server (TLS) id 14.0.682.1; Tue, 6 Apr 2010 15:15:25 -0700 Received: from SC-MBXC1.TheFacebook.com ([192.168.18.102]) by sc-hub06.TheFacebook.com ([192.168.18.83]) with mapi; Tue, 6 Apr 2010 15:15:08 -0700 From: Jonathan Gray To: "hbase-user@hadoop.apache.org" Date: Tue, 6 Apr 2010 15:15:06 -0700 Subject: RE: how can I check the I/O influence HBase to HDFS Thread-Topic: how can I check the I/O influence HBase to HDFS Thread-Index: AcrVnrUkou2xe+oPRRGhP45QS3cGCQAN3vwA Message-ID: <8D66B74984F9564BBB25C3C67D630F2D6689CF3E@SC-MBXC1.TheFacebook.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166 definitions=2010-04-06_16:2010-02-06,2010-04-06,2010-04-06 signatures=0 X-Virus-Checked: Checked by ClamAV on apache.org Can you explain more about what information you are trying to find out? You had an existing HDFS and you want to measure the additional impact addi= ng HBase is? Is that in terms of reads/writes/iops or data size? If you have a steady-state set of metrics for HDFS w/o HBase, can you not j= ust monitor those metrics w/ HBase running and calculate the deltas? Also, to what end are you trying to figure this out? I'm very much interes= ted in what courses of actions you might take given the different informati= on you could find out about HBase's influence on your cluster. JG > -----Original Message----- > From: steven zhuang [mailto:steven.zhuang.1984@gmail.com] > Sent: Tuesday, April 06, 2010 8:34 AM > To: hbase-user@hadoop.apache.org > Subject: how can I check the I/O influence HBase to HDFS >=20 > hi, there, > I have this problem of checking the influence HBase > brought to > HDFS. > I have a Hadoop cluster which has 30+ data nodes, and a > Hbase > cluster based on it, with 18 regionservers residing on 18 datanodes. > we have observed the HDFS IO has increased a lot if we do > some > importing or query ops on hbase tables, but we don't know how > much would hbase impact the HDFS, so now I have to dig into this. > my idea is as follows: >=20 > 1. grep from regionservers logs the file information > of > hbase tables, which mainly should be store files' names and their > sizes, sum > the size up. > 2. grep from datanodes' logs the HDFS_READ/HDFS_WRITE > log, > and calculate the whole IO bytes. > 3. get the rate of HBase IO / HDFS IO. >=20 > my concern is that if the above idea is right, is there > anything missing or a better way to do this? >=20 > And to make it more convinsible, I want to have the > block > info for each HTable's, not just those ones under each table's > directory, > but also those store files which was later removed by major compaction, > since in datanode log, all I can see is block id, any pointer or hint > is > really appreciated.