Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 208E36E56 for ; Tue, 28 Jun 2011 16:39:45 +0000 (UTC) Received: (qmail 50749 invoked by uid 500); 28 Jun 2011 16:39:44 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 50635 invoked by uid 500); 28 Jun 2011 16:39:44 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 50627 invoked by uid 99); 28 Jun 2011 16:39:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 16:39:43 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.18.222.47] (HELO smtp1.4emm.com) (69.18.222.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Jun 2011 16:39:38 +0000 Received: from EX2K7VS03.4emm.local ([192.168.160.203]) by HUB01.4emm.local ([192.168.161.132]) with mapi; Tue, 28 Jun 2011 12:39:16 -0400 From: Doug Meil To: "dev@hbase.apache.org" CC: Jonathan Hsieh Date: Tue, 28 Jun 2011 12:40:05 -0400 Subject: RE: Retry HTable.put() on client-side to handle temp connectivity problem Thread-Topic: Retry HTable.put() on client-side to handle temp connectivity problem Thread-Index: Acw1rpi596R2X0LYSOmbzmhGP4Y/yAAAzK7g Message-ID: <67680900F79B1D4F99C844EE386FC5952823CD8CEB@EX2K7VS03.4emm.local> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org I agree with what Todd & Gary said. I don't like retry-forever, especiall= y as a default option in HBase. -----Original Message----- From: Gary Helmling [mailto:ghelmling@gmail.com]=20 Sent: Tuesday, June 28, 2011 12:18 PM To: dev@hbase.apache.org Cc: Jonathan Hsieh Subject: Re: Retry HTable.put() on client-side to handle temp connectivity = problem I'd also be wary of changing the default to retry forever. This might be h= ard to differentiate from a hang or deadlock for new users and seems to vio= late "least surprise". In many cases it's preferable to have some kind of predictable failure as w= ell. So I think this would appear to be a regression in behavior. If you'= re serving say web site data from hbase, you may prefer an occasional error= or timeout rather than having page loading hang forever. I'm all for making "retry forever" a configurable option, but do we need an= y new knobs here? --gh On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria wrote: > If I could override the default, I'd be a hesitant +1. I'd rather see=20 > the default be something like retry 10 times, then throw an error. > With one option being infinite retries. > > -Joey > > On Mon, Jun 27, 2011 at 2:21 PM, Stack wrote: > > I'd be fine with changing the default in hbase so clients just keep=20 > > trying. What do others think? > > St.Ack > > > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau=20 > > > wrote: > >> The code I pasted works for me: it reconnects successfully. Just=20 > >> thought > it > >> might be not the best way to do it.. I realized that by using HBase=20 > >> configuration properties we could just say that it's up to user to > configure > >> HBase client (created by Flume) properly (e.g. by adding=20 > >> hbase-site.xml > with > >> settings to classpath). On the other hand, it looks to me that=20 > >> users of HBase sinks will *always* want it to retry writing to=20 > >> HBase until it > works > >> out. But default configuration works not this way: sinks stops when > HBase is > >> temporarily down or inaccessible. Hence it makes using the sink=20 > >> more complicated (because default configuration sucks), which I'd=20 > >> like to > avoid > >> here by adding the code above. Ideally the default configuration=20 > >> should > work > >> the best way for general-purpose case. > >> > >> I understood what are the ways to implement/configure such=20 > >> behavior. I > think > >> we should discuss what is the best default behavior and do we need=20 > >> to > allow > >> user override it on Flume ML (or directly at=20 > >> https://issues.cloudera.org/browse/FLUME-685). > >> > >> Thank you guys, > >> > >> Alex Baranau > >> ---- > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop=20 > >> - > HBase > >> > >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack wrote: > >> > >>> Either should work Alex. Your version will go "for ever". Have=20 > >>> you tried yanking hbase out from under the client to see if it reconn= ects? > >>> > >>> Good on you, > >>> St.Ack > >>> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau < > alex.baranov.v@gmail.com> > >>> wrote: > >>> > Yes, that is what intended, I think. To make the whole picture=20 > >>> > clear, > >>> here's > >>> > the context: > >>> > > >>> > * there's a Flume's HBase sink (read: HBase client) which writes=20 > >>> > data > >>> from > >>> > Flume "pipe" (read: some event-based messages source) to HTable; > >>> > * when HBase is down for some time (with default HBase=20 > >>> > configuration > on > >>> > Flume's sink side) HTable.put throws exception and client exits=20 > >>> > (it > >>> usually > >>> > takes ~10 min to fail); > >>> > * Flume is smart enough to accumulate data to be written=20 > >>> > reliably if > sink > >>> > behaves badly (not writing for some time, pauses, etc.), so it=20 > >>> > would > be > >>> > great if the sink tries to write data until HBase is up again, BUT: > >>> > * but here, as we have complete "failure" of sink process=20 > >>> > (thread > needs > >>> to > >>> > be restarted) the data never reaches HTable even after HBase=20 > >>> > cluster > is > >>> > brought up again. > >>> > > >>> > So you suggest instead of this extra construction around=20 > >>> > HTable.put > to > >>> use > >>> > configuration properties "hbase.client.pause" and=20 > >>> > "hbase.client.retries.number"? I.e. make retries attempts to be > >>> (reasonably) > >>> > close to "perform forever". Is that what you meant? > >>> > > >>> > Thank you, > >>> > Alex Baranau > >>> > ---- > >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -=20 > >>> > Hadoop - > >>> HBase > >>> > > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu > wrote: > >>> > > >>> >> This would retry indefinitely, right ? > >>> >> Normally maximum retry duration would govern how long the retry=20 > >>> >> is attempted. > >>> >> > >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau < > alex.baranov.v@gmail.com > >>> >> >wrote: > >>> >> > >>> >> > Hello, > >>> >> > > >>> >> > Just wanted to confirm that I'm doing things in a proper way her= e. > How > >>> >> > about > >>> >> > this code to handle the temp cluster connectivity problems=20 > >>> >> > (or > cluster > >>> >> down > >>> >> > time) on client-side? > >>> >> > > >>> >> > + // HTable.put() will fail with exception if connection=20 > >>> >> > + to > cluster > >>> is > >>> >> > temporarily broken or > >>> >> > + // cluster is temporarily down. To be sure data is=20 > >>> >> > + written we > >>> retry > >>> >> > writing. > >>> >> > + boolean dataWritten =3D false; > >>> >> > + do { > >>> >> > + try { > >>> >> > + table.put(p); > >>> >> > + dataWritten =3D true; > >>> >> > + } catch (IOException ioe) { // indicates cluster > connectivity > >>> >> > problem > >>> >> > (also thrown when cluster is down) > >>> >> > + LOG.error("Writing data to HBase failed, will try=20 > >>> >> > + again > in " > >>> + > >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe); > >>> >> > + =20 > >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL > * > >>> >> 1000); > >>> >> > + } > >>> >> > + } while (!dataWritten); > >>> >> > > >>> >> > Thank you in advance, > >>> >> > Alex Baranau > >>> >> > ---- > >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - > Hadoop - > >>> >> HBase > >>> >> > > >>> >> > >>> > > >>> > >> > > > > > > -- > Joseph Echeverria > Cloudera, Inc. > 443.305.9434 >