Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2DBB792E7 for ; Mon, 20 Feb 2012 13:03:32 +0000 (UTC) Received: (qmail 39880 invoked by uid 500); 20 Feb 2012 13:03:30 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 39794 invoked by uid 500); 20 Feb 2012 13:03:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 39778 invoked by uid 99); 20 Feb 2012 13:03:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Feb 2012 13:03:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of praveensripati@gmail.com designates 209.85.215.41 as permitted sender) Received: from [209.85.215.41] (HELO mail-lpp01m010-f41.google.com) (209.85.215.41) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Feb 2012 13:03:22 +0000 Received: by lamf4 with SMTP id f4so8275330lam.14 for ; Mon, 20 Feb 2012 05:03:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=NaHcgRAla7H4Il8MaGZmqqrcn+tqrqHw09vsc4QwtXc=; b=hCmYB0dwiCwKyZQiI/vYk35ct1mHsjUDDtqpGQAMYD8tzdQ00Hezw5/X9ekLP1xBLz ivWRGhlKGCqcNYOa1rqz3gCkovebWbx/d1XtaFZkig1ROPgB/SlHLZCVfxn/CXL+rEu6 XyTHbM14npovrkDaAzjsLFv3TGY+Tlx/kuuNg= MIME-Version: 1.0 Received: by 10.112.8.41 with SMTP id o9mr6688869lba.9.1329742981531; Mon, 20 Feb 2012 05:03:01 -0800 (PST) Received: by 10.112.93.149 with HTTP; Mon, 20 Feb 2012 05:03:01 -0800 (PST) In-Reply-To: References: Date: Mon, 20 Feb 2012 18:33:01 +0530 Message-ID: Subject: Re: HBase and Data Locality From: Praveen Sripati To: user@hbase.apache.org, dev@hbase.apache.org Content-Type: multipart/alternative; boundary=90e6ba10ad810e89b104b964ea97 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba10ad810e89b104b964ea97 Content-Type: text/plain; charset=ISO-8859-1 Looking at the DefaultLoadBalancer.balance(), the balancing is purely based on the number of regions hosted per region server and not on the resource usage. HBASE-57 suggests to use the data locality into consideration when the regions are assigned to the region server. It would be nice to consider both the resource usage of the region and the data locality into consideration, not just purely based on the number of regions in the region server as implemented currently. The file to block mapping can be found from the HDFS NameNode, but how to find out which regions are loaded (# of requests, cpu and memory perspective) and which are not? I could not see any resource utilization in the region server pages. Also, curious if HBASE-57 makes sense, since the major compaction runs every 24 hrs and the HFiles are all local to the regions after major compaction. I think that the balancer has to be run manually in HDFS and there will be a maximum of 24 hrs window between a HDFS balancer execution and a major compaction during which data locality might be lost. I am interested in working on this JIRA, but need some help from the HBase community. Regards, Praveen On Tue, Feb 14, 2012 at 7:34 PM, Mikael Sitruk wrote: > Region allocation is kept in the next restart ( > https://issues.apache.org/jira/browse/HBASE-2896 ). This is also present > in > the CDH3 code. > Nevertheless if you have a server that did not start correctly you will > have region that will move from it and locality will not remain (even after > you start the problematic node, since he will get random regions) > The best solution would be effectivly > https://issues.apache.org/jira/browse/HBASE-57 > > > Mikael.S > > On Tue, Feb 14, 2012 at 3:19 PM, Brock Noland wrote: > > > Hi, > > > > On Tue, Feb 14, 2012 at 7:13 AM, Praveen Sripati > > wrote: > > > Lars blog (1) mentions that data locality for the region servers is > lost > > > when HBase cluster is restarted. It's also mentioned at the end that > work > > > is going in HBase to assign regions to RS taking data locality into > > > consideration. The blog entry is 18 months old and so I would like to > > know > > > if this has been incorporated into the latest HBase release or data > > > locality is lost till a compaction is complete. > > > > JIRA is down for me, but here is the JIRA: > > > > https://issues.apache.org/jira/browse/HBASE-2896 > > > > I am pretty sure it's been included in the latest HBase release as it's > in > > CDH3. > > > > Brock > > > > -- > > Apache MRUnit - Unit testing MapReduce - > > http://incubator.apache.org/mrunit/ > > > > > > -- > Mikael.S > --90e6ba10ad810e89b104b964ea97--