Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5121017F8E for ; Mon, 4 May 2015 08:32:12 +0000 (UTC) Received: (qmail 21499 invoked by uid 500); 4 May 2015 08:32:10 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 21420 invoked by uid 500); 4 May 2015 08:32:10 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 21408 invoked by uid 99); 4 May 2015 08:32:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 08:32:09 +0000 X-ASF-Spam-Status: No, hits=3.2 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which is an MX secondary for user@hbase.apache.org) Received: from [54.76.25.247] (HELO mx1-eu-west.apache.org) (54.76.25.247) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 08:31:42 +0000 Received: from mail-la0-f53.google.com (mail-la0-f53.google.com [209.85.215.53]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 567A32540F for ; Mon, 4 May 2015 08:31:41 +0000 (UTC) Received: by lagv1 with SMTP id v1so99125805lag.3 for ; Mon, 04 May 2015 01:31:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-type; bh=MEOdxdQKRNqeGNeH2YFp68I9vpGXXywUoaKaWfAd+ks=; b=ln1sFbnLob+ShyK679mj3B9yfOFOl+53THKzSfyCiiAVzXxsJqWD0hns7XOZCmxCfp 8pYHvgse/p81dHaNcGHKn5S+xejOVscfmsNT0e+Rc7zI2wvuB89WAuqARX15MXPtrnyn zV95axlQJcXEBj7dc2YfEB3h4aCRybOdtbf+jve8L+sf8//f1PyIM9615XgQBNdjvsQA CZrm+WkYjWCRKwxDqlTcDVuApDjIXMI2P7MF8oBBo5wk0uBcIw1YNdGOyp+23K5PFpkP Tuzeiqb6LcuOdbwKPaF6ZORSXIJFBP22AUfUqibgeBK31lW/YW21NGDsXBQ17J/DfbJk VW4A== X-Received: by 10.112.147.9 with SMTP id tg9mr18500303lbb.94.1430728300769; Mon, 04 May 2015 01:31:40 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Dejan Menges Date: Mon, 04 May 2015 08:31:40 +0000 Message-ID: Subject: Re: Right value for hbase.rpc.timeout To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=047d7b3a8f5c23392105153d64ae X-Virus-Checked: Checked by ClamAV on apache.org --047d7b3a8f5c23392105153d64ae Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Ted, Max filesize for region is set to 75G in our case. Regarding split policy we use most likely ConstantSizeRegionSplitPolicy (it's 0.98.0 with bunch of patches and that should be default one). Also, regarding link you sent me in 98.3 - I can not find anywhere what's default value for hbase.regionserver.lease.period? Is this parameter still called like this? On Thu, Apr 30, 2015 at 11:27 PM Ted Yu wrote: > Please take a look at 98.3 under > http://hbase.apache.org/book.html#trouble.client > > BTW what's the value for hbase.hregion.max.filesize ? > Which split policy do you use ? > > Cheers > > On Thu, Apr 30, 2015 at 6:59 AM, Dejan Menges > wrote: > > > Basically how I came to this question - this happened super rarely, and > we > > narrowed it down to hotspotting. Map was timing out on three regions > which > > were 4-5 times bigger then other regions for the same table, and region > > split fixed this. > > > > However, was just thinking about if there are maybe some recommendation= s > or > > something about this, as it's also super hard to reproduce again same > > situation to retest it. > > > > On Thu, Apr 30, 2015 at 3:56 PM Michael Segel > > > wrote: > > > > > There is no single =E2=80=98right=E2=80=99 value. > > > > > > As you pointed out=E2=80=A6 some of your Mapper.map() iterations are = taking > > longer > > > than 60 seconds. > > > > > > The first thing is to determine why that happens. (It could be norma= l, > > or > > > it could be bad code on your developers part. We don=E2=80=99t know.) > > > > > > The other thing is that if you determine that your code is perfect an= d > it > > > does what you want it to do=E2=80=A6 and its a major part of your use= case=E2=80=A6 you > > > then increase your timeouts to 120 seconds. > > > > > > The reason why its a tough issue is that we don=E2=80=99t know what h= ardware > you > > > are using. How many nodes=E2=80=A6 code quality.. etc =E2=80=A6 too m= any factors. > > > > > > > > > > On Apr 30, 2015, at 6:51 AM, Dejan Menges > > > wrote: > > > > > > > > Hi, > > > > > > > > What's the best practice to calculate this value for your cluster, = if > > > there > > > > is some? > > > > > > > > In some situations we saw that some maps are taking more than defau= lt > > 60 > > > > seconds which was failing specific map job (as if it failed once, i= t > > > failed > > > > also every other time by number of configured retries). > > > > > > > > I would like to tune RPC parameters a bit, but googling and looking > > into > > > > HBase Book doesn't tell me how to calculate right values, and what > else > > > to > > > > take a look beside hbase.rpc.timeout. > > > > > > > > Thanks a lot, > > > > Dejan > > > > > > > > > --047d7b3a8f5c23392105153d64ae--