Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 97A23108D5 for ; Wed, 3 Dec 2014 14:40:04 +0000 (UTC) Received: (qmail 84256 invoked by uid 500); 3 Dec 2014 14:40:03 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 84183 invoked by uid 500); 3 Dec 2014 14:40:03 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 84172 invoked by uid 99); 3 Dec 2014 14:40:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Dec 2014 14:40:03 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kevin.odell@cloudera.com designates 74.125.82.53 as permitted sender) Received: from [74.125.82.53] (HELO mail-wg0-f53.google.com) (74.125.82.53) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Dec 2014 14:39:36 +0000 Received: by mail-wg0-f53.google.com with SMTP id l18so20098218wgh.12 for ; Wed, 03 Dec 2014 06:38:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=7lCipPlCucPPcpU3SZkdvWCn3D2v4iAYEdwjWKWYqqQ=; b=hmLxb0t9NCVcsSmlGSasc0kmvk3sCzSp9cHI0MXVuB177gua70aOkHnImCR4Rvdv+K 4ezQi2NX3QMuuix5luP0UAuuLHt1kAYl3MJn8+mFQ8wFn0EFPUJV75MmndYwBcBDVnp+ uKMo6ye/CsSeHWsYMmgR0PMN2i+bwsryQ0Xcah+GCJ2vZveAzN+bi3ICqJ4/yRkyJXPE io0Z/oUC0TVkXpeBU8fDHLoojZ0egn6vgHWY/XHFtw0BGX/HyDXMJ3+GAuZAK8w+xkcT cuZ1/c8rwQ0fl0PVV9JSYvHMQBWjUBog6VmM49ZzLCDCGNg0iKOyjKad1P7TlxMMmzQx +IaA== X-Gm-Message-State: ALoCoQnUoDTCN4t1PgNohgDTEW7wQUKTs5mzWAH/BUoCS8Nuy7gjK/BA/fXR1dI0RgJE5OCuKUsw MIME-Version: 1.0 X-Received: by 10.194.122.66 with SMTP id lq2mr7856880wjb.54.1417617486140; Wed, 03 Dec 2014 06:38:06 -0800 (PST) Received: by 10.27.126.135 with HTTP; Wed, 3 Dec 2014 06:38:06 -0800 (PST) In-Reply-To: References: Date: Wed, 3 Dec 2014 09:38:06 -0500 Message-ID: Subject: Re: Region is out of bounds From: "Kevin O'dell" To: dev@hbase.apache.org Content-Type: multipart/alternative; boundary=089e01177a41b0704c050950cafc X-Virus-Checked: Checked by ClamAV on apache.org --089e01177a41b0704c050950cafc Content-Type: text/plain; charset=UTF-8 Vladimir, I know you said, "do not ask me why", but I am going to have to ask you why. The fact you are doing this(this being blocking store files > 200) tells me there is something or multiple somethings wrong with your cluster setup. A couple things come to mind: * During this heavy write period, could we use bulk loads? If so, this should solve almost all of your problems * 1GB region size is WAY too small, and if you are pushing the volume of data you are talking about I would recommend 10 - 20GB region sizes this should help keep your region count smaller as well which will result in more optimal writes * Your cluster may be undersized, if you are setting the blocking to be that high, you may be pushing too much data for your cluster overall. Would you be so kind as to pass me a few pieces of information? 1.) Cluster size 2.) Average region count per RS 3.) Heap size, Memstore global settings, and block cache settings 4.) a RS log to pastebin and a time frame of "high writes" I can probably make some solid suggestions for you based on the above data. On Wed, Dec 3, 2014 at 1:04 AM, Vladimir Rodionov wrote: > This is what we observed in our environment(s) > > The issue exists in CDH4.5, 5.1, HDP2.1, Mapr4 > > If some one sets # of blocking stores way above default value, say - 200 to > avoid write stalls during intensive data loading (do not ask me , why we do > this), then > one of the regions grows indefinitely and takes more 99% of overall table. > > It can't be split because it still has orphaned reference files. Some of a > reference files are able to avoid compactions for a long time, obviously. > > The split policy is IncreasingToUpperBound, max region size is 1G. I do my > tests on CDH4.5 mostly but all other distros seem have the same issue. > > My attempt to add reference files forcefully to compaction list in > Store.requetsCompaction() when region exceeds recommended maximum size did > not work out well - some weird results in our test cases (but HBase tests > are OK: small, medium and large). > > What is so special with these reference files? Any ideas, what can be done > here to fix the issue? > > -Vladimir Rodionov > -- Kevin O'Dell Systems Engineer, Cloudera --089e01177a41b0704c050950cafc--