hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Retry HTable.put() on client-side to handle temp connectivity problem
Date Tue, 28 Jun 2011 23:17:55 GMT
I also think if it takes 10 minutes to fail, that is probably too long.
 
Best regards,


   - Andy


Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White)


----- Original Message -----
> From: Doug Meil <doug.meil@explorysmedical.com>
> To: "dev@hbase.apache.org" <dev@hbase.apache.org>
> Cc: Jonathan Hsieh <jon@cloudera.com>
> Sent: Tuesday, June 28, 2011 9:40 AM
> Subject: RE: Retry HTable.put() on client-side to handle temp connectivity problem
> 
> I agree with what Todd & Gary said.   I don't like retry-forever, 
> especially as a default option in HBase.
> 
> 
> -----Original Message-----
> From: Gary Helmling [mailto:ghelmling@gmail.com] 
> Sent: Tuesday, June 28, 2011 12:18 PM
> To: dev@hbase.apache.org
> Cc: Jonathan Hsieh
> Subject: Re: Retry HTable.put() on client-side to handle temp connectivity 
> problem
> 
> I'd also be wary of changing the default to retry forever.  This might be 
> hard to differentiate from a hang or deadlock for new users and seems to violate 
> "least surprise".
> 
> In many cases it's preferable to have some kind of predictable failure as 
> well.  So I think this would appear to be a regression in behavior.  If 
> you're serving say web site data from hbase, you may prefer an occasional 
> error or timeout rather than having page loading hang forever.
> 
> I'm all for making "retry forever" a configurable option, but do 
> we need any new knobs here?
> 
> --gh
> 
> 
> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <joey@cloudera.com> 
> wrote:
> 
>>  If I could override the default, I'd be a hesitant +1. I'd rather 
> see 
>>  the default be something like retry 10 times, then throw an error.
>>  With one option being infinite retries.
>> 
>>  -Joey
>> 
>>  On Mon, Jun 27, 2011 at 2:21 PM, Stack <stack@duboce.net> wrote:
>>  > I'd be fine with changing the default in hbase so clients just 
> keep 
>>  > trying.  What do others think?
>>  > St.Ack
>>  >
>>  > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau 
>>  > <alex.baranov.v@gmail.com>
>>  wrote:
>>  >> The code I pasted works for me: it reconnects successfully. Just 
>>  >> thought
>>  it
>>  >> might be not the best way to do it.. I realized that by using 
> HBase 
>>  >> configuration properties we could just say that it's up to 
> user to
>>  configure
>>  >> HBase client (created by Flume) properly (e.g. by adding 
>>  >> hbase-site.xml
>>  with
>>  >> settings to classpath). On the other hand, it looks to me that 
>>  >> users of HBase sinks will *always* want it to retry writing to 
>>  >> HBase until it
>>  works
>>  >> out. But default configuration works not this way: sinks stops 
> when
>>  HBase is
>>  >> temporarily down or inaccessible. Hence it makes using the sink 
>>  >> more complicated (because default configuration sucks), which 
> I'd 
>>  >> like to
>>  avoid
>>  >> here by adding the code above. Ideally the default configuration 
>>  >> should
>>  work
>>  >> the best way for general-purpose case.
>>  >>
>>  >> I understood what are the ways to implement/configure such 
>>  >> behavior. I
>>  think
>>  >> we should discuss what is the best default behavior and do we need 
> 
>>  >> to
>>  allow
>>  >> user override it on Flume ML (or directly at 
>>  >> https://issues.cloudera.org/browse/FLUME-685).
>>  >>
>>  >> Thank you guys,
>>  >>
>>  >> Alex Baranau
>>  >> ----
>>  >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop 
> 
>>  >> -
>>  HBase
>>  >>
>>  >>
>>  >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <stack@duboce.net> 
> wrote:
>>  >>
>>  >>> Either should work Alex.  Your version will go "for 
> ever".  Have 
>>  >>> you tried yanking hbase out from under the client to see if it 
> reconnects?
>>  >>>
>>  >>> Good on you,
>>  >>> St.Ack
>>  >>>
>>  >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <
>>  alex.baranov.v@gmail.com>
>>  >>> wrote:
>>  >>> > Yes, that is what intended, I think. To make the whole 
> picture 
>>  >>> > clear,
>>  >>> here's
>>  >>> > the context:
>>  >>> >
>>  >>> > * there's a Flume's HBase sink (read: HBase 
> client) which writes 
>>  >>> > data
>>  >>> from
>>  >>> > Flume "pipe" (read: some event-based messages 
> source) to HTable;
>>  >>> > * when HBase is down for some time (with default HBase 
>>  >>> > configuration
>>  on
>>  >>> > Flume's sink side) HTable.put throws exception and 
> client exits 
>>  >>> > (it
>>  >>> usually
>>  >>> > takes ~10 min to fail);
>>  >>> > * Flume is smart enough to accumulate data to be written 
>>  >>> > reliably if
>>  sink
>>  >>> > behaves badly (not writing for some time, pauses, etc.), 
> so it 
>>  >>> > would
>>  be
>>  >>> > great if the sink tries to write data until HBase is up 
> again, BUT:
>>  >>> > * but here, as we have complete "failure" of 
> sink process 
>>  >>> > (thread
>>  needs
>>  >>> to
>>  >>> > be restarted) the data never reaches HTable even after 
> HBase 
>>  >>> > cluster
>>  is
>>  >>> > brought up again.
>>  >>> >
>>  >>> > So you suggest instead of this extra construction around 
>>  >>> > HTable.put
>>  to
>>  >>> use
>>  >>> > configuration properties "hbase.client.pause" 
> and 
>>  >>> > "hbase.client.retries.number"? I.e. make 
> retries attempts to be
>>  >>> (reasonably)
>>  >>> > close to "perform forever". Is that what you 
> meant?
>>  >>> >
>>  >>> > Thank you,
>>  >>> > Alex Baranau
>>  >>> > ----
>>  >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch 
> - 
>>  >>> > Hadoop -
>>  >>> HBase
>>  >>> >
>>  >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu 
> <yuzhihong@gmail.com>
>>  wrote:
>>  >>> >
>>  >>> >> This would retry indefinitely, right ?
>>  >>> >> Normally maximum retry duration would govern how long 
> the retry 
>>  >>> >> is attempted.
>>  >>> >>
>>  >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <
>>  alex.baranov.v@gmail.com
>>  >>> >> >wrote:
>>  >>> >>
>>  >>> >> > Hello,
>>  >>> >> >
>>  >>> >> > Just wanted to confirm that I'm doing things 
> in a proper way here.
>>  How
>>  >>> >> > about
>>  >>> >> > this code to handle the temp cluster 
> connectivity problems 
>>  >>> >> > (or
>>  cluster
>>  >>> >> down
>>  >>> >> > time) on client-side?
>>  >>> >> >
>>  >>> >> > +    // HTable.put() will fail with exception if 
> connection 
>>  >>> >> > + to
>>  cluster
>>  >>> is
>>  >>> >> > temporarily broken or
>>  >>> >> > +    // cluster is temporarily down. To be sure 
> data is 
>>  >>> >> > + written we
>>  >>> retry
>>  >>> >> > writing.
>>  >>> >> > +    boolean dataWritten = false;
>>  >>> >> > +    do {
>>  >>> >> > +      try {
>>  >>> >> > +        table.put(p);
>>  >>> >> > +        dataWritten = true;
>>  >>> >> > +      } catch (IOException ioe) { // indicates 
> cluster
>>  connectivity
>>  >>> >> > problem
>>  >>> >> > (also thrown when cluster is down)
>>  >>> >> > +        LOG.error("Writing data to HBase 
> failed, will try 
>>  >>> >> > + again
>>  in "
>>  >>> +
>>  >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", 
> ioe);
>>  >>> >> > +        
>>  >>> >> > + 
> Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL
>>  *
>>  >>> >> 1000);
>>  >>> >> > +      }
>>  >>> >> > +    } while (!dataWritten);
>>  >>> >> >
>>  >>> >> > Thank you in advance,
>>  >>> >> > Alex Baranau
>>  >>> >> > ----
>>  >>> >> > Sematext :: http://sematext.com/ :: Solr - 
> Lucene - Nutch -
>>  Hadoop -
>>  >>> >> HBase
>>  >>> >> >
>>  >>> >>
>>  >>> >
>>  >>>
>>  >>
>>  >
>> 
>> 
>> 
>>  --
>>  Joseph Echeverria
>>  Cloudera, Inc.
>>  443.305.9434
>> 
>

Mime
View raw message