Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
From: Doug Meil <doug.meil@explorysmedical.com>
To: "dev@hbase.apache.org" <dev@hbase.apache.org>
CC: Jonathan Hsieh <jon@cloudera.com>
Date: Tue, 28 Jun 2011 12:40:05 -0400
Subject: RE: Retry HTable.put() on client-side to handle temp connectivity
 problem
Thread-Topic: Retry HTable.put() on client-side to handle temp connectivity
 problem
Thread-Index: Acw1rpi596R2X0LYSOmbzmhGP4Y/yAAAzK7g
Message-ID: <67680900F79B1D4F99C844EE386FC5952823CD8CEB@EX2K7VS03.4emm.local>
References: <BANLkTikdu226sssxCXy061WMj7CL8cXJYw@mail.gmail.com>
	<BANLkTi=mMBqeK0eCg8ehAjG9io0ikDa=zw@mail.gmail.com>
	<BANLkTikqGbS3fnaf5eDMbYYU-VsCMWB6FA@mail.gmail.com>
	<BANLkTi=AwewN3OaNcUWiaXvDGtqnymmmEA@mail.gmail.com>
	<BANLkTik3wMn64kktRCw3Z9waMV205nYXGQ@mail.gmail.com>
	<BANLkTi=KYCxHwcnSg58QGCweMWtTTB=iaw@mail.gmail.com>
	<BANLkTimQsvvM05xDctZmTbf46ipgo_xbhA@mail.gmail.com>
 <BANLkTimAT-BoeVyh0WBbVPq5opuDsNfmFA@mail.gmail.com>
In-Reply-To: <BANLkTimAT-BoeVyh0WBbVPq5opuDsNfmFA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

I agree with what Todd & Gary said.   I don't like retry-forever, especiall=
y as a default option in HBase.


-----Original Message-----
From: Gary Helmling [mailto:ghelmling@gmail.com]=20
Sent: Tuesday, June 28, 2011 12:18 PM
To: dev@hbase.apache.org
Cc: Jonathan Hsieh
Subject: Re: Retry HTable.put() on client-side to handle temp connectivity =
problem

I'd also be wary of changing the default to retry forever.  This might be h=
ard to differentiate from a hang or deadlock for new users and seems to vio=
late "least surprise".

In many cases it's preferable to have some kind of predictable failure as w=
ell.  So I think this would appear to be a regression in behavior.  If you'=
re serving say web site data from hbase, you may prefer an occasional error=
 or timeout rather than having page loading hang forever.

I'm all for making "retry forever" a configurable option, but do we need an=
y new knobs here?

--gh


On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <joey@cloudera.com> wrote:

> If I could override the default, I'd be a hesitant +1. I'd rather see=20
> the default be something like retry 10 times, then throw an error.
> With one option being infinite retries.
>
> -Joey
>
> On Mon, Jun 27, 2011 at 2:21 PM, Stack <stack@duboce.net> wrote:
> > I'd be fine with changing the default in hbase so clients just keep=20
> > trying.  What do others think?
> > St.Ack
> >
> > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau=20
> > <alex.baranov.v@gmail.com>
> wrote:
> >> The code I pasted works for me: it reconnects successfully. Just=20
> >> thought
> it
> >> might be not the best way to do it.. I realized that by using HBase=20
> >> configuration properties we could just say that it's up to user to
> configure
> >> HBase client (created by Flume) properly (e.g. by adding=20
> >> hbase-site.xml
> with
> >> settings to classpath). On the other hand, it looks to me that=20
> >> users of HBase sinks will *always* want it to retry writing to=20
> >> HBase until it
> works
> >> out. But default configuration works not this way: sinks stops when
> HBase is
> >> temporarily down or inaccessible. Hence it makes using the sink=20
> >> more complicated (because default configuration sucks), which I'd=20
> >> like to
> avoid
> >> here by adding the code above. Ideally the default configuration=20
> >> should
> work
> >> the best way for general-purpose case.
> >>
> >> I understood what are the ways to implement/configure such=20
> >> behavior. I
> think
> >> we should discuss what is the best default behavior and do we need=20
> >> to
> allow
> >> user override it on Flume ML (or directly at=20
> >> https://issues.cloudera.org/browse/FLUME-685).
> >>
> >> Thank you guys,
> >>
> >> Alex Baranau
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop=20
> >> -
> HBase
> >>
> >>
> >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <stack@duboce.net> wrote:
> >>
> >>> Either should work Alex.  Your version will go "for ever".  Have=20
> >>> you tried yanking hbase out from under the client to see if it reconn=
ects?
> >>>
> >>> Good on you,
> >>> St.Ack
> >>>
> >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <
> alex.baranov.v@gmail.com>
> >>> wrote:
> >>> > Yes, that is what intended, I think. To make the whole picture=20
> >>> > clear,
> >>> here's
> >>> > the context:
> >>> >
> >>> > * there's a Flume's HBase sink (read: HBase client) which writes=20
> >>> > data
> >>> from
> >>> > Flume "pipe" (read: some event-based messages source) to HTable;
> >>> > * when HBase is down for some time (with default HBase=20
> >>> > configuration
> on
> >>> > Flume's sink side) HTable.put throws exception and client exits=20
> >>> > (it
> >>> usually
> >>> > takes ~10 min to fail);
> >>> > * Flume is smart enough to accumulate data to be written=20
> >>> > reliably if
> sink
> >>> > behaves badly (not writing for some time, pauses, etc.), so it=20
> >>> > would
> be
> >>> > great if the sink tries to write data until HBase is up again, BUT:
> >>> > * but here, as we have complete "failure" of sink process=20
> >>> > (thread
> needs
> >>> to
> >>> > be restarted) the data never reaches HTable even after HBase=20
> >>> > cluster
> is
> >>> > brought up again.
> >>> >
> >>> > So you suggest instead of this extra construction around=20
> >>> > HTable.put
> to
> >>> use
> >>> > configuration properties "hbase.client.pause" and=20
> >>> > "hbase.client.retries.number"? I.e. make retries attempts to be
> >>> (reasonably)
> >>> > close to "perform forever". Is that what you meant?
> >>> >
> >>> > Thank you,
> >>> > Alex Baranau
> >>> > ----
> >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -=20
> >>> > Hadoop -
> >>> HBase
> >>> >
> >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> >>> >
> >>> >> This would retry indefinitely, right ?
> >>> >> Normally maximum retry duration would govern how long the retry=20
> >>> >> is attempted.
> >>> >>
> >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> >>> >> >wrote:
> >>> >>
> >>> >> > Hello,
> >>> >> >
> >>> >> > Just wanted to confirm that I'm doing things in a proper way her=
e.
> How
> >>> >> > about
> >>> >> > this code to handle the temp cluster connectivity problems=20
> >>> >> > (or
> cluster
> >>> >> down
> >>> >> > time) on client-side?
> >>> >> >
> >>> >> > +    // HTable.put() will fail with exception if connection=20
> >>> >> > + to
> cluster
> >>> is
> >>> >> > temporarily broken or
> >>> >> > +    // cluster is temporarily down. To be sure data is=20
> >>> >> > + written we
> >>> retry
> >>> >> > writing.
> >>> >> > +    boolean dataWritten =3D false;
> >>> >> > +    do {
> >>> >> > +      try {
> >>> >> > +        table.put(p);
> >>> >> > +        dataWritten =3D true;
> >>> >> > +      } catch (IOException ioe) { // indicates cluster
> connectivity
> >>> >> > problem
> >>> >> > (also thrown when cluster is down)
> >>> >> > +        LOG.error("Writing data to HBase failed, will try=20
> >>> >> > + again
> in "
> >>> +
> >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe);
> >>> >> > +       =20
> >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL
> *
> >>> >> 1000);
> >>> >> > +      }
> >>> >> > +    } while (!dataWritten);
> >>> >> >
> >>> >> > Thank you in advance,
> >>> >> > Alex Baranau
> >>> >> > ----
> >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
> Hadoop -
> >>> >> HBase
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>