Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B443074DC for ; Thu, 3 Nov 2011 02:56:22 +0000 (UTC) Received: (qmail 35740 invoked by uid 500); 3 Nov 2011 02:56:20 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 35507 invoked by uid 500); 3 Nov 2011 02:56:19 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 35493 invoked by uid 99); 3 Nov 2011 02:56:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Nov 2011 02:56:17 +0000 X-ASF-Spam-Status: No, hits=1.2 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS,URIBL_JP_SURBL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nspiegelberg@fb.com designates 67.231.153.30 as permitted sender) Received: from [67.231.153.30] (HELO mx0a-00082601.pphosted.com) (67.231.153.30) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Nov 2011 02:56:12 +0000 Received: from pps.filterd (m0004077 [127.0.0.1]) by mx0b-00082601.pphosted.com (8.14.4/8.14.4) with SMTP id pA32rapq000493; Wed, 2 Nov 2011 19:55:51 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fb.com; h=from : to : subject : date : message-id : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=facebook; bh=7HeX+uMYQ+PYE01olyksbAwlJzEhw0zsofrY0gEtn0s=; b=Q8hcoTLCcdnyhZNwd3OZgEq7nN/60bo5reN7Dbi5iprdJd17SGOcv1rhcBIfbfkBCLc+ Wr3MRRfjGDIcHs9e9HUmZYVIrtneT+4KjL6mR81vYMbDPvulRzTd/J09gn/Q3crryh3q 6Wx9OuvQUGV+rf4dwv6k+1yCMz2dasNOtLE= Received: from mail.thefacebook.com (corpout1.snc1.tfbnw.net [66.220.144.38]) by mx0b-00082601.pphosted.com with ESMTP id 10uj1cg1bh-1 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Wed, 02 Nov 2011 19:55:51 -0700 Received: from SC-MBX01-4.TheFacebook.com ([fe80::6c2c:b681:4e19:7b5f]) by sc-hub03.TheFacebook.com ([192.168.18.198]) with mapi id 14.01.0289.001; Wed, 2 Nov 2011 19:55:45 -0700 From: Nicolas Spiegelberg To: "user@hbase.apache.org" , lars hofhansl Subject: Re: region size/count per regionserver Thread-Topic: region size/count per regionserver Thread-Index: AQHMmL9gG6UAJEr0FkKZD9aM0dQthJWYoaMAgAI9xQD//5c/AA== Date: Thu, 3 Nov 2011 02:55:45 +0000 Message-ID: In-Reply-To: <1320286238.35445.YahooMailNeo@web121702.mail.ne1.yahoo.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.13.0.110805 x-originating-ip: [192.168.18.252] Content-Type: text/plain; charset="us-ascii" Content-ID: <697F268D76C8D441AFD7AA52D0DAD2D9@fb.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:5.4.6813,1.0.211,0.0.0000 definitions=2011-11-02_06:2011-11-03,2011-11-02,1970-01-01 signatures=0 X-Proofpoint-Spam-Reason: safe Region Scalability is definitely an investigation item that has not been covered yet. We solved the problem with horizontal sharding into multiple clusters instead of tackling that subject with the timeframe we had. I'm guessing the 2-level ROOT/META was a response to that problem. On the actual region count / data size, that all depends on how high you want to scale your StoreFile size. 10GB StoreFiles are currently normal / reasonable. On 11/2/11 7:10 PM, "lars hofhansl" wrote: >Do we know what would need to change in HBase in order to be able to >manage more regions per regionserver? >With 20 regions per server, one would need 300G regions to just utilize >6T of drive space. > > >To utilize a regionserver/datanode with 24T drive space the region size >would be an insane 1T. > >-- Lars > >________________________________ >From: Nicolas Spiegelberg >To: "user@hbase.apache.org" >Cc: Karthik Ranganathan ; Kannan Muthukkaruppan > >Sent: Tuesday, November 1, 2011 3:57 PM >Subject: Re: region size/count per regionserver > >Simple answer >------------- >20 regions/server & <2000 regions/cluster is a good rule of thumb if you >can't profile your workload yet. You really want to ensure that > >1) You need to limits the regions/cluster so the master can have a >reasonable startup time & can handle all the region state transitions via >ZK. Most bigger companies are running 2,000 in production and achieve >reasonable startup times (< 2 minutes for region assignment on cold >start). If you want to test the scalability of that algorithm beyond what >other companies need, admin beware. >2) The more regions/server you have, the faster that recovery can happen >after RS death because you can currently parallelize recovery on a >region-granularity. Too many regions/server and #1 starts to be a >problem. > > > >Complicated answer >------------------ >More information is optimize this formula. Additional considerations: > >1) Are you IO-bound or CPU-bound >2) What is your grid topology like >3) What is your network hardware like >4) How many disks (not just size) >5) What is the data locality between RegionServer & DataNode > >In the Facebook case, we have 5 racks with 20 nodes each. Servers in the >rack are connected by 1G Eth to a switch with a 10G uplink. We are >network bound. Our saturation point is mostly commonly on the top-of-rack >switch. With 20 regions/server, we can roughly parallelize our >distributed log splitting within a single rack on RS death (although 2 >regions do split off-rack). This minimizes top-of-rack traffic and >optimized our recovery time. Even if you are CPU-bound, log splitting >(hence recovery time) is an IO-bound operation. A lot of our work on >region assignment is about maximizing data locality, even on RS death, so >we avoid top-of-rack saturation. > > >On 11/1/11 10:54 AM, "Sujee Maniyam" wrote: > >>HI all, >>My HBase cluster is 10 nodes, each node has 12core , 48G RAM, 24TB >>disk, >>10GEthernet. >>My region size is 1GB. >> >>Any guidelines on how many regions can a RS handle comfortably? >>I vaguely remember reading some where to have no more than 1000 regions / >>server; that comes to 1TB / server. Seems pretty low for the current >>hardware config. >> >>Any rules of thumb? experiences? >> >>thanks >>Sujee >> >>http://sujee.net