hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Retry HTable.put() on client-side to handle temp connectivity problem
Date Tue, 28 Jun 2011 15:45:06 GMT
With Flume's store-and-forward, why do we need retry-forever in the HBase
side? It seems to me that if the sink "dies" for some reason, then it should
push that back to the upstream parts of the flume dataflow, and have them
buffer data on local disk.

-Todd

On Mon, Jun 27, 2011 at 1:56 PM, Alex Baranau <alex.baranov.v@gmail.com>wrote:

> The code I pasted works for me: it reconnects successfully. Just thought it
> might be not the best way to do it.. I realized that by using HBase
> configuration properties we could just say that it's up to user to
> configure
> HBase client (created by Flume) properly (e.g. by adding hbase-site.xml
> with
> settings to classpath). On the other hand, it looks to me that users of
> HBase sinks will *always* want it to retry writing to HBase until it works
> out. But default configuration works not this way: sinks stops when HBase
> is
> temporarily down or inaccessible. Hence it makes using the sink more
> complicated (because default configuration sucks), which I'd like to avoid
> here by adding the code above. Ideally the default configuration should
> work
> the best way for general-purpose case.
>
> I understood what are the ways to implement/configure such behavior. I
> think
> we should discuss what is the best default behavior and do we need to allow
> user override it on Flume ML (or directly at
> https://issues.cloudera.org/browse/FLUME-685).
>
> Thank you guys,
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
>
> On Mon, Jun 27, 2011 at 11:40 PM, Stack <stack@duboce.net> wrote:
>
> > Either should work Alex.  Your version will go "for ever".  Have you
> > tried yanking hbase out from under the client to see if it reconnects?
> >
> > Good on you,
> > St.Ack
> >
> > On Mon, Jun 27, 2011 at 1:33 PM, Alex Baranau <alex.baranov.v@gmail.com>
> > wrote:
> > > Yes, that is what intended, I think. To make the whole picture clear,
> > here's
> > > the context:
> > >
> > > * there's a Flume's HBase sink (read: HBase client) which writes data
> > from
> > > Flume "pipe" (read: some event-based messages source) to HTable;
> > > * when HBase is down for some time (with default HBase configuration on
> > > Flume's sink side) HTable.put throws exception and client exits (it
> > usually
> > > takes ~10 min to fail);
> > > * Flume is smart enough to accumulate data to be written reliably if
> sink
> > > behaves badly (not writing for some time, pauses, etc.), so it would be
> > > great if the sink tries to write data until HBase is up again, BUT:
> > > * but here, as we have complete "failure" of sink process (thread needs
> > to
> > > be restarted) the data never reaches HTable even after HBase cluster is
> > > brought up again.
> > >
> > > So you suggest instead of this extra construction around HTable.put to
> > use
> > > configuration properties "hbase.client.pause" and
> > > "hbase.client.retries.number"? I.e. make retries attempts to be
> > (reasonably)
> > > close to "perform forever". Is that what you meant?
> > >
> > > Thank you,
> > > Alex Baranau
> > > ----
> > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
> > HBase
> > >
> > > On Mon, Jun 27, 2011 at 11:16 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > >> This would retry indefinitely, right ?
> > >> Normally maximum retry duration would govern how long the retry is
> > >> attempted.
> > >>
> > >> On Mon, Jun 27, 2011 at 1:08 PM, Alex Baranau <
> alex.baranov.v@gmail.com
> > >> >wrote:
> > >>
> > >> > Hello,
> > >> >
> > >> > Just wanted to confirm that I'm doing things in a proper way here.
> How
> > >> > about
> > >> > this code to handle the temp cluster connectivity problems (or
> cluster
> > >> down
> > >> > time) on client-side?
> > >> >
> > >> > +    // HTable.put() will fail with exception if connection to
> cluster
> > is
> > >> > temporarily broken or
> > >> > +    // cluster is temporarily down. To be sure data is written we
> > retry
> > >> > writing.
> > >> > +    boolean dataWritten = false;
> > >> > +    do {
> > >> > +      try {
> > >> > +        table.put(p);
> > >> > +        dataWritten = true;
> > >> > +      } catch (IOException ioe) { // indicates cluster connectivity
> > >> > problem
> > >> > (also thrown when cluster is down)
> > >> > +        LOG.error("Writing data to HBase failed, will try again in
> "
> > +
> > >> > RETRY_INTERVAL_ON_WRITE_FAIL + " sec", ioe);
> > >> > +        Thread.currentThread().wait(RETRY_INTERVAL_ON_WRITE_FAIL
*
> > >> 1000);
> > >> > +      }
> > >> > +    } while (!dataWritten);
> > >> >
> > >> > Thank you in advance,
> > >> > Alex Baranau
> > >> > ----
> > >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop
> -
> > >> HBase
> > >> >
> > >>
> > >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message