Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C6E4710AB4 for ; Tue, 17 Sep 2013 17:49:02 +0000 (UTC) Received: (qmail 64484 invoked by uid 500); 17 Sep 2013 17:48:59 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 64435 invoked by uid 500); 17 Sep 2013 17:48:59 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 64035 invoked by uid 99); 17 Sep 2013 17:48:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2013 17:48:57 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tombrown52@gmail.com designates 209.85.214.169 as permitted sender) Received: from [209.85.214.169] (HELO mail-ob0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Sep 2013 17:48:52 +0000 Received: by mail-ob0-f169.google.com with SMTP id wp4so6026387obc.28 for ; Tue, 17 Sep 2013 10:48:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=jtABzTRpcc2GaXux8n4HETXxpaSQb7tx8k+jaVx3kyE=; b=a3FXquCY1gpSCpShklKuapVyKfTN98fYGkXgI1FTRUucNsMYbR57EHrLSR4waTrBxM ThqQSrj+ZT5v9KiSGCfevq0NNTIcraEP1ZWkQfoYNG6Q3aiTd+NQG1PmGEnVe8H4uznU +dNHXWxta00Gqxpkh1PNGk4hkVx9dzggw0NS6h5MhL7zpedAG1/4BwPpDpEC0AeiM/x1 q3rNGCzaciRMSz0kCZ0xXoktSBen1i5zYyW+nHIW1N3K7m4T5vXwgZg+ENq/YgL9d3p/ Behdv2TWAM3Ssdlg17nrTOO2G7ow/zYOEN5BUYw0iKlntwk2TVxG2+Inw7AP6+k5llLQ FH0A== MIME-Version: 1.0 X-Received: by 10.182.230.135 with SMTP id sy7mr1043888obc.24.1379440111295; Tue, 17 Sep 2013 10:48:31 -0700 (PDT) Received: by 10.182.42.194 with HTTP; Tue, 17 Sep 2013 10:48:31 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 Sep 2013 11:48:31 -0600 Message-ID: Subject: Re: How to manage retry failures in the HBase client From: Tom Brown To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a11c33676d31a7704e697edf6 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c33676d31a7704e697edf6 Content-Type: text/plain; charset=ISO-8859-1 I had read that section for those values, but it was unclear (the hbase.client.retries.number description subtly switches to describe hbase.client.pause, and I missed that context switch). If I could make a recommendation as to changing those items descriptions, I would rearrange it like so: hbase.client.pause General client pause value. Used mostly as value to wait before running a retry of a failed get, region lookup, etc. The actual retry interval is a rough function based on this setting. At first we retry at this interval but then with backoff, we pretty quickly reach retrying every ten seconds. See HConstants#RETRY_BACKOFF for how the backup ramps up. Default: 100 hbase.client.retries.number Maximum retries. Used as maximum for all retryable operations such as the getting of a cell's value, starting a row update, etc. Change this setting and hbase.client.pause to suit your workload. Default: 35 What is the formal way to request a specific documentation change? Do I need to sign a contributor agreement? --Tom On Tue, Sep 17, 2013 at 11:40 AM, Ted Yu wrote: > Have you looked at > http://hbase.apache.org/book.html#hbase_default_configurations where > hbase.client.retries.number > and hbase.client.pause are explained ? > > Cheers > > > On Tue, Sep 17, 2013 at 10:34 AM, Tom Brown wrote: > > > I have a region-server coprocessor that scans it's portion of a table > based > > on a request and summarizes the results (designed this way to reduce > > network data transfer). > > > > In certain circumstances, the HBase cluster gets a bit overloaded, and a > > query will take too long. In that instance, the HBase client will retry > the > > query (up to N times). When this happens, any other running queries will > > often timeout and generate retries as well. This results in the cluster > > becoming unresponsive, until I'm able to kill the clients that are > retrying > > their requests. > > > > I have found the "hbase.client.retries.number" property, but that doesn't > > claim to set the number of retries, rather the amount of time between > > retries. Is there a different property I can use to set the maximum > number > > of retries? Or is this property mis-documented? > > > > Thanks in advance! > > > > --Tom > > > --001a11c33676d31a7704e697edf6--