Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AD090913E for ; Tue, 21 Feb 2012 09:32:50 +0000 (UTC) Received: (qmail 9336 invoked by uid 500); 21 Feb 2012 09:32:49 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 9259 invoked by uid 500); 21 Feb 2012 09:32:48 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 9247 invoked by uid 99); 21 Feb 2012 09:32:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Feb 2012 09:32:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of praveensripati@gmail.com designates 209.85.215.41 as permitted sender) Received: from [209.85.215.41] (HELO mail-lpp01m010-f41.google.com) (209.85.215.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Feb 2012 09:32:40 +0000 Received: by lamf4 with SMTP id f4so9755068lam.14 for ; Tue, 21 Feb 2012 01:32:19 -0800 (PST) Received-SPF: pass (google.com: domain of praveensripati@gmail.com designates 10.112.8.41 as permitted sender) client-ip=10.112.8.41; Authentication-Results: mr.google.com; spf=pass (google.com: domain of praveensripati@gmail.com designates 10.112.8.41 as permitted sender) smtp.mail=praveensripati@gmail.com; dkim=pass header.i=praveensripati@gmail.com Received: from mr.google.com ([10.112.8.41]) by 10.112.8.41 with SMTP id o9mr9527198lba.9.1329816739489 (num_hops = 1); Tue, 21 Feb 2012 01:32:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+L+J9NdQqSVb3Is6EVz1lnm/5PBeO+RVk3rg6GwLswE=; b=Z8MkPa6SeB3AOcue8/1x9z1e9mUJCCvMI1aOLU17111nHFjS/SDo8ZkQPJRKnZ4W3u UEGK0bhpQ+ZHATeiF3h9QDuU14y/SVbCmkBpt1scIv1GotaOrxPIiHLG8mAQiU7XY8GT vy865IA8blX7w5uyvN2dTBMDU+FIMGmvXApCo= MIME-Version: 1.0 Received: by 10.112.8.41 with SMTP id o9mr8003329lba.9.1329816739395; Tue, 21 Feb 2012 01:32:19 -0800 (PST) Received: by 10.112.93.149 with HTTP; Tue, 21 Feb 2012 01:32:19 -0800 (PST) In-Reply-To: References: Date: Tue, 21 Feb 2012 15:02:19 +0530 Message-ID: Subject: Re: HBase and Data Locality From: Praveen Sripati To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=90e6ba10ad815e380904b9761652 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba10ad815e380904b9761652 Content-Type: text/plain; charset=ISO-8859-1 Stack, > Its recommended that you run major compactions yourself at down times. Can we change the `hbase.hregion.majorcompaction` value from 86400000 to -1 along with the required code changes and make a note of it in the hbase-default.xml? Also, the hbase.master.loadbalancer.class is not specified in the hbase-default.xml. Should I open a JIRA and make those two changes? > In 0.92 there is hits per region and this gets reported to the master as part of ClusterStatus as does memory usage. This could be factored into a new balance algorithm. I just saw the code for the ClusterStatus, HServerLoad and RegionLoad. First cut I was thinking if we can use only the resource usage (memory, cpu, # of hits and not block location) to sort the region server in decreasing order of resource usage and then shed the regions from the top till the region server with an average usage is hit and then assign the regions which have been shed to the region server list from the bottom. This is more or less similar to the DefaultLoadBalancer, but taking the resource usage into considerations and not on the # of regions. Question is should memory, cpu and # of hits be given equal weightage or make it configurable? > On locality, the fb lads are working on a primitive that makes it so the hbase dfsclient will tell hdfs where to place blocks. Is there any JIRA for it? So using this functionality HBase can specify the location of the 2nd and the 3rd replica (1st is local) and later use the location of the 2nd and 3rd replica during a region server crash or a region movement. What happens if the HDFS load balancer is run, the blocks are moved again? > This primitive that the lads are working on needs to be done I believe before hbase-57 can be done (properly). Again the question is should we mix the memory, cpu, # of hits with the block location and give them configurable weightage or keep the data locality out for the first cut, since the major compaction (although manual as per recommendation) will pull all the blocks together anyway. Thanks, Praveen On Tue, Feb 21, 2012 at 2:39 AM, Stack wrote: > On Mon, Feb 20, 2012 at 5:03 AM, Praveen Sripati > wrote: > > It would be nice to consider > > both the resource usage of the region and the data locality into > > consideration, not just purely based on the number of regions in the > region > > server as implemented currently. > > > > Yes. > > > The file to block mapping can be found from the HDFS NameNode, but how to > > find out which regions are loaded (# of requests, cpu and memory > > perspective) and which are not? I could not see any resource utilization > in > > the region server pages. > > > > In 0.92 there is hits per region and this gets reported to the master > as part of ClusterStatus as does memory usage. This could be factored > into a new balance algorithm. Could also send over cpu and hardware > profile for factoring (though much of this is available via JMX -- > either we get these into clusterstatus or master does poll on jmx > after it sees new server to get server profile) > > > Also, curious if HBASE-57 makes sense, since the major compaction runs > > every 24 hrs > > Its recommended that you run major compactions yourself at down times. > > I think that the balancer has to be run manually in HDFS and > > there will be a maximum of 24 hrs window between a HDFS balancer > execution > > and a major compaction during which data locality might be lost. > > > > Yes the hdfs balancer needs to be run manually and yes it knows > nothing of how hbase has ordered the blocks and will not respect > region locality when it goes about its business. > > I'm sure though I follow the rest of what you are saying above. > > On locality, the fb lads are working on a primitive that makes it so > the hbase dfsclient will tell hdfs where to place blocks. The favored > replica locations will be kept up in .META. in a new column When a > regionserver crashes, or if we want to move a region, we'll move it or > reopen it on one of the locations that has had region blocks > replicated to it. This should help improve the locality story on > failover/move. > > Without this functionality, we're left with the current behavior where > blocks for regions are scattered and its only per chance you'd have > good locality opening a region in any location other than the current > deploy where the gentle waves of compaction having been nudging data > local. > > I dont believe there is an issue for the above yet. Let me chase the > lads to file one. > > This primitive that the lads are working on needs to be done I believe > before hbase-57 can be done (properly). What you reckon Praveen? > > St.Ack > --90e6ba10ad815e380904b9761652--