hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject RE: Retry HTable.put() on client-side to handle temp connectivity problem
Date Tue, 28 Jun 2011 16:40:05 GMT
I agree with what Todd & Gary said.   I don't like retry-forever, especially as a default
option in HBase.


-----Original Message-----
From: Gary Helmling [mailto:ghelmling@gmail.com] 
Sent: Tuesday, June 28, 2011 12:18 PM
To: dev@hbase.apache.org
Cc: Jonathan Hsieh
Subject: Re: Retry HTable.put() on client-side to handle temp connectivity problem

I'd also be wary of changing the default to retry forever.  This might be hard to differentiate
from a hang or deadlock for new users and seems to violate "least surprise".

In many cases it's preferable to have some kind of predictable failure as well.  So I think
this would appear to be a regression in behavior.  If you're serving say web site data from
hbase, you may prefer an occasional error or timeout rather than having page loading hang
forever.

I'm all for making "retry forever" a configurable option, but do we need any new knobs here?

--gh


On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <joey@cloudera.com> wrote:

> If I could override the default, I'd be a hesitant +1. I'd rather see 
> the default be something like retry 10 times, then throw an error.
> With one option being infinite retries.
>
> -Joey
>
> On Mon, Jun 27, 2011 at 2:21 PM, Stack <stack@duboce.net> wrote:
> > I'd be fine with changing the default in hbase so clients just keep 
> > trying.  What do others think?
> > St.Ack
> >
> > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau 
> > <alex.baranov.v@gmail.com>
> wrote:
> >> The code I pasted works for me: it reconnects successfully. Just 
> >> thought
> it
> >> might be not the best way to do it.. I realized that by using HBase 
> >> configuration properties we could just say that it's up to user to
> configure
> >> HBase client (created by Flume) properly (e.g. by adding 
> >> hbase-site.xml
> with
> >> settings to classpath). On the other hand, it looks to me that 
> >> users of HBase sinks will *always* want it to retry writing to 
> >> HBase until it
> works
> >> out. But default configuration works not this way: sinks stops when
> HBase is
> >> temporarily down or inaccessible. Hence it makes using the sink 
> >> more complicated (because default configuration sucks), which I'd 
> >> like to
> avoid
> >> here by adding the code above. Ideally the default configuration 
> >> should
> work
> >> the best way for general-purpose case.
> >>
> >> I understood what are the ways to implement/configure such 
> >> behavior. I
> think
> >> we should discuss what is the best default behavior and do we need 
> >> to
> allow
> >> user override it on Flume ML (or directly at 
> >> https://issues.cloudera.org/browse/FLUME-685).
> >>
> >> Thank you guys,
> >>
> >> Alex Baranau
> >> ----
> >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop 
> >> -
> HBase
> >>
> >>
> >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <stack@duboce.net> wrote:
> >>
> >>> Either should work Alex.  Your version will go "for ever".  Have 
> >>> you tried yanking hbase out from under the client to see if it reconnects?
> >>>
> >>> Good on you,
> >>> St.Ack
> >>>
> >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <
> alex.baranov.v@gmail.com>
> >>> wrote:
> >>> > Yes, that is what intended, I think. To make the whole picture 
> >>> > clear,
> >>> here's
> >>> > the context:
> >>> >
> >>> > * there's a Flume's HBase sink (read: HBase client) which writes 
> >>> > data
> >>> from
> >>> > Flume "pipe" (read: some event-based messages source) to HTable;
> >>> > * when HBase is down for some time (with default HBase 
> >>> > configuration
> on
> >>> > Flume's sink side) HTable.put throws exception and client exits 
> >>> > (it
> >>> usually
> >>> > takes ~10 min to fail);
> >>> > * Flume is smart enough to accumulate data to be written 
> >>> > reliably if
> sink
> >>> > behaves badly (not writing for some time, pauses, etc.), so it 
> >>> > would
> be
> >>> > great if the sink tries to write data until HBase is up again, BUT:
> >>> > * but here, as we have complete "failure" of sink process 
> >>> > (thread
> needs
> >>> to
> >>> > be restarted) the data never reaches HTable even after HBase 
> >>> > cluster
> is
> >>> > brought up again.
> >>> >
> >>> > So you suggest instead of this extra construction around 
> >>> > HTable.put
> to
> >>> use
> >>> > configuration properties "hbase.client.pause" and 
> >>> > "hbase.client.retries.number"? I.e. make retries attempts to be
> >>> (reasonably)
> >>> > close to "perform forever". Is that what you meant?
> >>> >
> >>> > Thank you,
> >>> > Alex Baranau
> >>> > ----
> >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - 
> >>> > Hadoop -
> >>> HBase
> >>> >
> >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> >>> >
> >>> >> This would retry indefinitely, right ?
> >>> >> Normally maximum retry duration would govern how long the retry

> >>> >> is attempted.
> >>> >>
> >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> >>> >> >wrote:
> >>> >>
> >>> >> > Hello,
> >>> >> >
> >>> >> > Just wanted to confirm that I'm doing things in a proper way
here.
> How
> >>> >> > about
> >>> >> > this code to handle the temp cluster connectivity problems

> >>> >> > (or
> cluster
> >>> >> down
> >>> >> > time) on client-side?
> >>> >> >
> >>> >> > +    // HTable.put() will fail with exception if connection

> >>> >> > + to
> cluster
> >>> is
> >>> >> > temporarily broken or
> >>> >> > +    // cluster is temporarily down. To be sure data is 
> >>> >> > + written we
> >>> retry
> >>> >> > writing.
> >>> >> > +    boolean dataWritten = false;
> >>> >> > +    do {
> >>> >> > +      try {
> >>> >> > +        table.put(p);
> >>> >> > +        dataWritten = true;
> >>> >> > +      } catch (IOException ioe) { // indicates cluster
> connectivity
> >>> >> > problem
> >>> >> > (also thrown when cluster is down)
> >>> >> > +        LOG.error("Writing data to HBase failed, will try

> >>> >> > + again
> in "
> >>> +
> >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe);
> >>> >> > +        
> >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL
> *
> >>> >> 1000);
> >>> >> > +      }
> >>> >> > +    } while (!dataWritten);
> >>> >> >
> >>> >> > Thank you in advance,
> >>> >> > Alex Baranau
> >>> >> > ----
> >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
-
> Hadoop -
> >>> >> HBase
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>
> >
>
>
>
> --
> Joseph Echeverria
> Cloudera, Inc.
> 443.305.9434
>

Mime
View raw message