hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Retry HTable.put() on client-side to handle temp connectivity problem
Date Wed, 29 Jun 2011 13:44:44 GMT

Hi there-

1)	Buffer/Batch

Addressing the comment in the Cloudera ticket (FLUME-390) "currently
non-written events are lost.", I agree that two paths (write-buffer vs.
batch-it-yourself) are available for Flume to recover from a failure and
know what hasn't been sent (or what was at least attempted to be sent).

Thus, I don't see this an "HBase issue".  There are existing APIs for
Flume to utilize that will get the job done.

2)      Retry-forver.

I've seen several folks vote -1 on retry-forever as default behavior.
Based on the conversation I'm assuming this won't happen.


Are there other aspects to this issue?  I doesn't seem like any HBase
changes are needed to address these issues.

  



On 6/29/11 2:17 AM, "Alex Baranau" <alex.baranov.v@gmail.com> wrote:

>I think you are talking here about loosing some data from client-side
>buffer. I don't think using batch will help. If we use batch from client
>code and want to use the client-side buffering, we would need to implement
>the same buffering code already implemented in HTable. The behavior and
>ack
>sending will be the same: the ack is sent after Flume sink receives the
>event, which might be buffered and not persisted (yet) to HBase. I haven't
>looked in Flume's ability to skip sending ack on receiving event in sink
>and
>doing it in batches later (after the actual persisting happens). Will
>investigate that as a separate effort.
>
>In general, please correct me if I'm wrong, but there won't be much
>difference between using HTable's batch and put:
>* with put() I can also tell what was persisted and which records failed,
>as
>they will be available in the client-side buffer after failures
>* internally put uses batch anyways (i.e. connection.processBatch)
>
>Alex Baranau
>----
>Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
>On Tue, Jun 28, 2011 at 10:41 PM, Doug Meil
><doug.meil@explorysmedical.com>wrote:
>
>>
>> But if Flume used the htable 'batch' method instead of 'put'...
>>
>> 
>>http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.htm
>>l#
>> batch%28java.util.List%29
>>
>> .. doesn't it sidestep this issue?  Because instead of being unsure what
>> was in the write-buffer and what wasn't, the caller knows exactly what
>>was
>> sent and whether it was sent without error.
>>
>>
>>
>>
>>
>> On 6/28/11 1:07 PM, "Alex Baranau" <alex.baranov.v@gmail.com> wrote:
>>
>> >> if the sink "dies" for some reason, then it should
>> >> push that back to the upstream parts of the flume dataflow, and have
>> >>them
>> >> buffer data on local disk.
>> >
>> >True. But this seem to be a separate issue:
>> >https://issues.cloudera.org/browse/FLUME-390.
>> >
>> >Alex Baranau
>> >----
>> >Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> HBase
>> >
>> >On Tue, Jun 28, 2011 at 7:40 PM, Doug Meil
>> ><doug.meil@explorysmedical.com>wrote:
>> >
>> >> I agree with what Todd & Gary said.   I don't like retry-forever,
>> >> especially as a default option in HBase.
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Gary Helmling [mailto:ghelmling@gmail.com]
>> >> Sent: Tuesday, June 28, 2011 12:18 PM
>> >> To: dev@hbase.apache.org
>> >> Cc: Jonathan Hsieh
>> >> Subject: Re: Retry HTable.put() on client-side to handle temp
>> >>connectivity
>> >> problem
>> >>
>> >> I'd also be wary of changing the default to retry forever.  This
>>might
>> >>be
>> >> hard to differentiate from a hang or deadlock for new users and
>>seems to
>> >> violate "least surprise".
>> >>
>> >> In many cases it's preferable to have some kind of predictable
>>failure
>> >>as
>> >> well.  So I think this would appear to be a regression in behavior.
>>If
>> >> you're serving say web site data from hbase, you may prefer an
>> >>occasional
>> >> error or timeout rather than having page loading hang forever.
>> >>
>> >> I'm all for making "retry forever" a configurable option, but do we
>>need
>> >> any new knobs here?
>> >>
>> >> --gh
>> >>
>> >>
>> >> On Mon, Jun 27, 2011 at 3:23 PM, Joey Echeverria <joey@cloudera.com>
>> >> wrote:
>> >>
>> >> > If I could override the default, I'd be a hesitant +1. I'd rather
>>see
>> >> > the default be something like retry 10 times, then throw an error.
>> >> > With one option being infinite retries.
>> >> >
>> >> > -Joey
>> >> >
>> >> > On Mon, Jun 27, 2011 at 2:21 PM, Stack <stack@duboce.net> wrote:
>> >> > > I'd be fine with changing the default in hbase so clients just
>>keep
>> >> > > trying.  What do others think?
>> >> > > St.Ack
>> >> > >
>> >> > > On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau
>> >> > > <alex.baranov.v@gmail.com>
>> >> > wrote:
>> >> > >> The code I pasted works for me: it reconnects successfully.
Just
>> >> > >> thought
>> >> > it
>> >> > >> might be not the best way to do it.. I realized that by using
>>HBase
>> >> > >> configuration properties we could just say that it's up to
user
>>to
>> >> > configure
>> >> > >> HBase client (created by Flume) properly (e.g. by adding
>> >> > >> hbase-site.xml
>> >> > with
>> >> > >> settings to classpath). On the other hand, it looks to me
that
>> >> > >> users of HBase sinks will *always* want it to retry writing
to
>> >> > >> HBase until it
>> >> > works
>> >> > >> out. But default configuration works not this way: sinks stops
>>when
>> >> > HBase is
>> >> > >> temporarily down or inaccessible. Hence it makes using the
sink
>> >> > >> more complicated (because default configuration sucks), which
>>I'd
>> >> > >> like to
>> >> > avoid
>> >> > >> here by adding the code above. Ideally the default configuration
>> >> > >> should
>> >> > work
>> >> > >> the best way for general-purpose case.
>> >> > >>
>> >> > >> I understood what are the ways to implement/configure such
>> >> > >> behavior. I
>> >> > think
>> >> > >> we should discuss what is the best default behavior and do
we
>>need
>> >> > >> to
>> >> > allow
>> >> > >> user override it on Flume ML (or directly at
>> >> > >> https://issues.cloudera.org/browse/FLUME-685).
>> >> > >>
>> >> > >> Thank you guys,
>> >> > >>
>> >> > >> Alex Baranau
>> >> > >> ----
>> >> > >> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
-
>>Hadoop
>> >> > >> -
>> >> > HBase
>> >> > >>
>> >> > >>
>> >> > >> On Mon, Jun 27, 2011 at 11:40 PM, Stack <stack@duboce.net>
>>wrote:
>> >> > >>
>> >> > >>> Either should work Alex.  Your version will go "for ever".
>>Have
>> >> > >>> you tried yanking hbase out from under the client to see
if it
>> >> reconnects?
>> >> > >>>
>> >> > >>> Good on you,
>> >> > >>> St.Ack
>> >> > >>>
>> >> > >>> On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <
>> >> > alex.baranov.v@gmail.com>
>> >> > >>> wrote:
>> >> > >>> > Yes, that is what intended, I think. To make the
whole
>>picture
>> >> > >>> > clear,
>> >> > >>> here's
>> >> > >>> > the context:
>> >> > >>> >
>> >> > >>> > * there's a Flume's HBase sink (read: HBase client)
which
>>writes
>> >> > >>> > data
>> >> > >>> from
>> >> > >>> > Flume "pipe" (read: some event-based messages source)
to
>>HTable;
>> >> > >>> > * when HBase is down for some time (with default
HBase
>> >> > >>> > configuration
>> >> > on
>> >> > >>> > Flume's sink side) HTable.put throws exception and
client
>>exits
>> >> > >>> > (it
>> >> > >>> usually
>> >> > >>> > takes ~10 min to fail);
>> >> > >>> > * Flume is smart enough to accumulate data to be
written
>> >> > >>> > reliably if
>> >> > sink
>> >> > >>> > behaves badly (not writing for some time, pauses,
etc.), so
>>it
>> >> > >>> > would
>> >> > be
>> >> > >>> > great if the sink tries to write data until HBase
is up
>>again,
>> >>BUT:
>> >> > >>> > * but here, as we have complete "failure" of sink
process
>> >> > >>> > (thread
>> >> > needs
>> >> > >>> to
>> >> > >>> > be restarted) the data never reaches HTable even
after HBase
>> >> > >>> > cluster
>> >> > is
>> >> > >>> > brought up again.
>> >> > >>> >
>> >> > >>> > So you suggest instead of this extra construction
around
>> >> > >>> > HTable.put
>> >> > to
>> >> > >>> use
>> >> > >>> > configuration properties "hbase.client.pause" and
>> >> > >>> > "hbase.client.retries.number"? I.e. make retries
attempts to
>>be
>> >> > >>> (reasonably)
>> >> > >>> > close to "perform forever". Is that what you meant?
>> >> > >>> >
>> >> > >>> > Thank you,
>> >> > >>> > Alex Baranau
>> >> > >>> > ----
>> >> > >>> > Sematext :: http://sematext.com/ :: Solr - Lucene
- Nutch -
>> >> > >>> > Hadoop -
>> >> > >>> HBase
>> >> > >>> >
>> >> > >>> > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu
>><yuzhihong@gmail.com>
>> >> > wrote:
>> >> > >>> >
>> >> > >>> >> This would retry indefinitely, right ?
>> >> > >>> >> Normally maximum retry duration would govern
how long the
>>retry
>> >> > >>> >> is attempted.
>> >> > >>> >>
>> >> > >>> >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau
<
>> >> > alex.baranov.v@gmail.com
>> >> > >>> >> >wrote:
>> >> > >>> >>
>> >> > >>> >> > Hello,
>> >> > >>> >> >
>> >> > >>> >> > Just wanted to confirm that I'm doing things
in a proper
>>way
>> >> here.
>> >> > How
>> >> > >>> >> > about
>> >> > >>> >> > this code to handle the temp cluster connectivity
problems
>> >> > >>> >> > (or
>> >> > cluster
>> >> > >>> >> down
>> >> > >>> >> > time) on client-side?
>> >> > >>> >> >
>> >> > >>> >> > +    // HTable.put() will fail with exception
if
>>connection
>> >> > >>> >> > + to
>> >> > cluster
>> >> > >>> is
>> >> > >>> >> > temporarily broken or
>> >> > >>> >> > +    // cluster is temporarily down. To
be sure data is
>> >> > >>> >> > + written we
>> >> > >>> retry
>> >> > >>> >> > writing.
>> >> > >>> >> > +    boolean dataWritten = false;
>> >> > >>> >> > +    do {
>> >> > >>> >> > +      try {
>> >> > >>> >> > +        table.put(p);
>> >> > >>> >> > +        dataWritten = true;
>> >> > >>> >> > +      } catch (IOException ioe) { // indicates
cluster
>> >> > connectivity
>> >> > >>> >> > problem
>> >> > >>> >> > (also thrown when cluster is down)
>> >> > >>> >> > +        LOG.error("Writing data to HBase
failed, will try
>> >> > >>> >> > + again
>> >> > in "
>> >> > >>> +
>> >> > >>> >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe);
>> >> > >>> >> > +
>> >> > >>> >> > + Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL
>> >> > *
>> >> > >>> >> 1000);
>> >> > >>> >> > +      }
>> >> > >>> >> > +    } while (!dataWritten);
>> >> > >>> >> >
>> >> > >>> >> > Thank you in advance,
>> >> > >>> >> > Alex Baranau
>> >> > >>> >> > ----
>> >> > >>> >> > Sematext :: http://sematext.com/ :: Solr
- Lucene - Nutch
>>-
>> >> > Hadoop -
>> >> > >>> >> HBase
>> >> > >>> >> >
>> >> > >>> >>
>> >> > >>> >
>> >> > >>>
>> >> > >>
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Joseph Echeverria
>> >> > Cloudera, Inc.
>> >> > 443.305.9434
>> >> >
>> >>
>>
>>


Mime
View raw message