Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8616298E8 for ; Tue, 21 Feb 2012 21:58:21 +0000 (UTC) Received: (qmail 24602 invoked by uid 500); 21 Feb 2012 21:58:21 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 24511 invoked by uid 500); 21 Feb 2012 21:58:20 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 24501 invoked by uid 99); 21 Feb 2012 21:58:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Feb 2012 21:58:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mikael.sitruk@gmail.com designates 209.85.215.41 as permitted sender) Received: from [209.85.215.41] (HELO mail-lpp01m010-f41.google.com) (209.85.215.41) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Feb 2012 21:58:12 +0000 Received: by lamf4 with SMTP id f4so10857621lam.14 for ; Tue, 21 Feb 2012 13:57:52 -0800 (PST) Received-SPF: pass (google.com: domain of mikael.sitruk@gmail.com designates 10.152.130.234 as permitted sender) client-ip=10.152.130.234; Authentication-Results: mr.google.com; spf=pass (google.com: domain of mikael.sitruk@gmail.com designates 10.152.130.234 as permitted sender) smtp.mail=mikael.sitruk@gmail.com; dkim=pass header.i=mikael.sitruk@gmail.com Received: from mr.google.com ([10.152.130.234]) by 10.152.130.234 with SMTP id oh10mr22600748lab.35.1329861472385 (num_hops = 1); Tue, 21 Feb 2012 13:57:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=KiHNqcM5s1AQwC5YTN/H3cJRoLO8V5H+3RSzvlLofp0=; b=KESEz0XmRPNi3hT8t485sz22mJzjbg8KGFo/vFdwpPiSBd5umrp5SYaOVmv4teqlcS 42Bf2tyuYs6WldVqrReVoS5/X3vqEc1n3apfKrr3j5fE3deAiAmcMbe4e680fElWbTUJ 2IHjbF+Hfo82A6cfLEXZ9A7uR8XfZkQycZWlU= MIME-Version: 1.0 Received: by 10.152.130.234 with SMTP id oh10mr18929138lab.35.1329861472342; Tue, 21 Feb 2012 13:57:52 -0800 (PST) Received: by 10.112.4.101 with HTTP; Tue, 21 Feb 2012 13:57:52 -0800 (PST) Received: by 10.112.4.101 with HTTP; Tue, 21 Feb 2012 13:57:52 -0800 (PST) In-Reply-To: References: Date: Tue, 21 Feb 2012 23:57:52 +0200 Message-ID: Subject: Re: Scan performance on a big table as combination of multiple logic tables From: Mikael Sitruk To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d042c5ec7a8d21f04b98080af X-Virus-Checked: Checked by ClamAV on apache.org --f46d042c5ec7a8d21f04b98080af Content-Type: text/plain; charset=UTF-8 See inline On Feb 21, 2012 11:40 PM, "Jean-Daniel Cryans" wrote: > > On Tue, Feb 21, 2012 at 1:17 PM, Mikael Sitruk wrote: > > This is interesting J.D. so, is there a limitation on the region size or > > not? > > Your imagination? Like I said nothing blocks you in the code. > > > Can it be really any number? > > That's what it implies. > > > If so beside the collection time is there > > any impact (perhaps the documentation should be updated too)? > > Collection time? You mean GC? Sorry I don't get what you mean. > *Sorry, typo mistake (from mobile) I meant compaction not collection > > Regarding the number of regions you have (14,398) is it for a single RS? > > What is your number of RS? > > Currently 91 in that cluster. It varies :) > > We have >200 tables coming all in different sizes. *Not clear, 91 rs, and 14398 regions in total? Or per RS? Mikael.S > J-D > > > > > Mikael.S > > On Feb 21, 2012 10:09 PM, "Jean-Daniel Cryans" wrote: > > > >> On Sun, Feb 19, 2012 at 1:45 PM, Mikael Sitruk > >> wrote: > >> > During compaction the region is not out of service. > >> > According to documentation the max region size for V2 format is 20G > >> > And now the question: Assuming that 20G is the limit and the number of > >> > regions in a single RS should stay low < 500 it means that there is no > >> mean > >> > having RS with more than 10TB of storage to use by HBase (otherwise > >> > locality will not be achieve for some servers, i also assume that > >> > compression is used and therefore it compensate the need for additional > >> > space for replication)? > >> > If the max number of region per RS is smaller then the storage size is > >> even > >> > smaller. Is it correct? > >> > >> In the documentation 20GB is given as an example of a larger size that > >> can be supported, but nothing blocks you from going way higher than > >> that. I've done some import tests and had 100GB regions. It just takes > >> a while to compact the bigger files. > >> > >> Also you can go over 500 regions, in fact one of our clusters has > >> 14,398 regions right now. It's just a pain to reassign everything when > >> HBase boots but this is an offline cluster. > >> > >> J-D > >> --f46d042c5ec7a8d21f04b98080af--