hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Retry HTable.put() on client-side to handle temp connectivity problem
Date Tue, 28 Jun 2011 19:41:29 GMT

But if Flume used the htable 'batch' method instead of 'put'...

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#
batch%28java.util.List%29

.. doesn't it sidestep this issue?  Because instead of being unsure what
was in the write-buffer and what wasn't, the caller knows exactly what was
sent and whether it was sent without error.





On 6/28/11 1:07 PM, "Alex Baranau" <alex.baranov.v@gmail.com> wrote:

>> if the sink "dies" for some reason, then it should
>> push that back to the upstream parts of the flume dataflow, and have
>>them
>> buffer data on local disk.
>
>True. But this seem to be a separate issue:
>https://issues.cloudera.org/browse/FLUME-390.
>
>Alex Baranau
>----
>Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
>On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil
><doug.meil@explorysmedical.com>wrote:
>
>> I agree with what Todd & Gary said.   I don't like retry-forever,
>> especially as a default option in HBase.
>>
>>
>> -----Original Message-----
>> From: Gary Helmling [mailto:ghelmling@gmail.com]
>> Sent: Tuesday, June 28, 2011 12:18 PM
>> To: dev@hbase.apache.org
>> Cc: Jonathan Hsieh
>> Subject: Re: Retry HTable.put() on client-side to handle temp
>>connectivity
>> problem
>>
>> I'd also be wary of changing the default to retry forever.  This might
>>be
>> hard to differentiate from a hang or deadlock for new users and seems to
>> violate "least surprise".
>>
>> In many cases it's preferable to have some kind of predictable failure
>>as
>> well.  So I think this would appear to be a regression in behavior.  If
>> you're serving say web site data from hbase, you may prefer an
>>occasional
>> error or timeout rather than having page loading hang forever.
>>
>> I'm all for making "retry forever" a configurable option, but do we need
>> any new knobs here?
>>
>> --gh
>>
>>
>> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <joey@cloudera.com>
>> wrote:
>>
>> > If I could override the default, I'd be a hesitant +1. I'd rather see
>> > the default be something like retry 10 times, then throw an error.
>> > With one option being infinite retries.
>> >
>> > -Joey
>> >
>> > On Mon, Jun 27, 2011 at 2:21 PM, Stack <stack@duboce.net> wrote:
>> > > I'd be fine with changing the default in hbase so clients just keep
>> > > trying.  What do others think?
>> > > St.Ack
>> > >
>> > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau
>> > > <alex.baranov.v@gmail.com>
>> > wrote:
>> > >> The code I pasted works for me: it reconnects successfully. Just
>> > >> thought
>> > it
>> > >> might be not the best way to do it.. I realized that by using HBase
>> > >> configuration properties we could just say that it's up to user to
>> > configure
>> > >> HBase client (created by Flume) properly (e.g. by adding
>> > >> hbase-site.xml
>> > with
>> > >> settings to classpath). On the other hand, it looks to me that
>> > >> users of HBase sinks will *always* want it to retry writing to
>> > >> HBase until it
>> > works
>> > >> out. But default configuration works not this way: sinks stops when
>> > HBase is
>> > >> temporarily down or inaccessible. Hence it makes using the sink
>> > >> more complicated (because default configuration sucks), which I'd
>> > >> like to
>> > avoid
>> > >> here by adding the code above. Ideally the default configuration
>> > >> should
>> > work
>> > >> the best way for general-purpose case.
>> > >>
>> > >> I understood what are the ways to implement/configure such
>> > >> behavior. I
>> > think
>> > >> we should discuss what is the best default behavior and do we need
>> > >> to
>> > allow
>> > >> user override it on Flume ML (or directly at
>> > >> https://issues.cloudera.org/browse/FLUME-685).
>> > >>
>> > >> Thank you guys,
>> > >>
>> > >> Alex Baranau
>> > >> ----
>> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
>> > >> -
>> > HBase
>> > >>
>> > >>
>> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <stack@duboce.net> wrote:
>> > >>
>> > >>> Either should work Alex.  Your version will go "for ever".  Have
>> > >>> you tried yanking hbase out from under the client to see if it
>> reconnects?
>> > >>>
>> > >>> Good on you,
>> > >>> St.Ack
>> > >>>
>> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <
>> > alex.baranov.v@gmail.com>
>> > >>> wrote:
>> > >>> > Yes, that is what intended, I think. To make the whole picture
>> > >>> > clear,
>> > >>> here's
>> > >>> > the context:
>> > >>> >
>> > >>> > * there's a Flume's HBase sink (read: HBase client) which
writes
>> > >>> > data
>> > >>> from
>> > >>> > Flume "pipe" (read: some event-based messages source) to HTable;
>> > >>> > * when HBase is down for some time (with default HBase
>> > >>> > configuration
>> > on
>> > >>> > Flume's sink side) HTable.put throws exception and client
exits
>> > >>> > (it
>> > >>> usually
>> > >>> > takes ~10 min to fail);
>> > >>> > * Flume is smart enough to accumulate data to be written
>> > >>> > reliably if
>> > sink
>> > >>> > behaves badly (not writing for some time, pauses, etc.), so
it
>> > >>> > would
>> > be
>> > >>> > great if the sink tries to write data until HBase is up again,
>>BUT:
>> > >>> > * but here, as we have complete "failure" of sink process
>> > >>> > (thread
>> > needs
>> > >>> to
>> > >>> > be restarted) the data never reaches HTable even after HBase
>> > >>> > cluster
>> > is
>> > >>> > brought up again.
>> > >>> >
>> > >>> > So you suggest instead of this extra construction around
>> > >>> > HTable.put
>> > to
>> > >>> use
>> > >>> > configuration properties "hbase.client.pause" and
>> > >>> > "hbase.client.retries.number"? I.e. make retries attempts
to be
>> > >>> (reasonably)
>> > >>> > close to "perform forever". Is that what you meant?
>> > >>> >
>> > >>> > Thank you,
>> > >>> > Alex Baranau
>> > >>> > ----
>> > >>> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
-
>> > >>> > Hadoop -
>> > >>> HBase
>> > >>> >
>> > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <yuzhihong@gmail.com>
>> > wrote:
>> > >>> >
>> > >>> >> This would retry indefinitely, right ?
>> > >>> >> Normally maximum retry duration would govern how long
the retry
>> > >>> >> is attempted.
>> > >>> >>
>> > >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <
>> > alex.baranov.v@gmail.com
>> > >>> >> >wrote:
>> > >>> >>
>> > >>> >> > Hello,
>> > >>> >> >
>> > >>> >> > Just wanted to confirm that I'm doing things in a
proper way
>> here.
>> > How
>> > >>> >> > about
>> > >>> >> > this code to handle the temp cluster connectivity
problems
>> > >>> >> > (or
>> > cluster
>> > >>> >> down
>> > >>> >> > time) on client-side?
>> > >>> >> >
>> > >>> >> > +    // HTable.put() will fail with exception if
connection
>> > >>> >> > + to
>> > cluster
>> > >>> is
>> > >>> >> > temporarily broken or
>> > >>> >> > +    // cluster is temporarily down. To be sure data
is
>> > >>> >> > + written we
>> > >>> retry
>> > >>> >> > writing.
>> > >>> >> > +    boolean dataWritten = false;
>> > >>> >> > +    do {
>> > >>> >> > +      try {
>> > >>> >> > +        table.put(p);
>> > >>> >> > +        dataWritten = true;
>> > >>> >> > +      } catch (IOException ioe) { // indicates cluster
>> > connectivity
>> > >>> >> > problem
>> > >>> >> > (also thrown when cluster is down)
>> > >>> >> > +        LOG.error("Writing data to HBase failed,
will try
>> > >>> >> > + again
>> > in "
>> > >>> +
>> > >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe);
>> > >>> >> > +
>> > >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL
>> > *
>> > >>> >> 1000);
>> > >>> >> > +      }
>> > >>> >> > +    } while (!dataWritten);
>> > >>> >> >
>> > >>> >> > Thank you in advance,
>> > >>> >> > Alex Baranau
>> > >>> >> > ----
>> > >>> >> > Sematext :: http://sematext.com/ :: Solr - Lucene
- Nutch -
>> > Hadoop -
>> > >>> >> HBase
>> > >>> >> >
>> > >>> >>
>> > >>> >
>> > >>>
>> > >>
>> > >
>> >
>> >
>> >
>> > --
>> > Joseph Echeverria
>> > Cloudera, Inc.
>> > 443.305.9434
>> >
>>


Mime
View raw message