hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: Should HTable.put() return a Future?
Date Tue, 06 Apr 2010 17:12:41 GMT
I like this idea.

Putting major cluster events in some form into ZK.  Could be used for jobs as Todd says. 
Can also be used as a cluster history report on web ui and such.  Higher level historian.

I'm a fan of anything that moves us away from requiring parsing hundreds or thousands of lines
of logs to see what has happened.

JG

> -----Original Message-----
> From: Todd Lipcon [mailto:todd@cloudera.com]
> Sent: Tuesday, April 06, 2010 9:49 AM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: Should HTable.put() return a Future?
> 
> On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans
> <jdcryans@apache.org>wrote:
> 
> > Yes it is, you will be missing a RS ;)
> >
> >
> How do you detect this, though?
> 
> It might be useful to add a counter in ZK for region server crashes. If
> the
> master ever notices that a RS goes down, it increments it. Then we can
> check
> the before/after for a job and know when we might have lost some data.
> 
> -Todd
> 
> 
> > General rule when uploading without WAL is if there's a failure, the
> > job is screwed and that's the tradeoff for speed.
> >
> > J-D
> >
> > On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <todd@cloudera.com>
> wrote:
> > > On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans
> <jdcryans@apache.org
> > >wrote:
> > >
> > >> The issue isn't with the write buffer here, it's the WAL. Your
> edits
> > >> are in the MemStore so as far as your clients can tell, the data
> is
> > >> all persisted. In this case you would need to know when all the
> > >> memstores that contain your data are flushed... Best practice when
> > >> turning off WAL is force flushing the tables after the job is
> done,
> > >> else you can't guarantee durability for the last edits.
> > >>
> > >>
> > > You still can't guarantee durability for any of the edits, since a
> > failure
> > > in the middle of your job is undetectable :)
> > >
> > > -Todd
> > >
> > >
> > >> J-D
> > >>
> > >> On Tue, Apr 6, 2010 at 4:02 AM, Lars George
> <lars.george@gmail.com>
> > wrote:
> > >> > Hi,
> > >> >
> > >> > I have an issue where I do bulk import and since WAL is off and
> a
> > >> > default write buffer used (TableOutputFormat) I am running into
> > >> > situations where the MR job completes successfully but not all
> data is
> > >> > actually restored. The issue seems to be a failure on the RS
> side as
> > >> > it cannot flush the write buffers because the MR overloads the
> cluster
> > >> > (usually the .META: hosting RS is the breaking point) or causes
> the
> > >> > underlying DFS to go slow and that repercussions all the way up
> to the
> > >> > RS's.
> > >> >
> > >> > My question is, would it make sense as with any other
> asynchronous IO
> > >> > to return a Future from the put() that will help checking the
> status
> > >> > of the actual server side async flush operation? Or am I
> misguided
> > >> > here? Please advise.
> > >> >
> > >> > Lars
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
> 
> 
> 
> --
> Todd Lipcon
> Software Engineer, Cloudera

Mime
View raw message