hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Should HTable.put() return a Future?
Date Wed, 07 Apr 2010 00:57:56 GMT
Generally I can't agree that turning off the WAL is a good idea.  You
get speed, but at what cost?  Also it's kind of like punting on making
HLog fast, and reduces the incentives to do so.  I think that with a
single flush/sync per batch-put call will improve speed to the point
where running with WAL turned off will be of minimal value.

-ryan


On Tue, Apr 6, 2010 at 12:41 PM, Lars George <lars.george@gmail.com> wrote:
> I agree with Jon here, parsing these files especially not having a
> central logging is bad. I tried Splunk and that sort of worked as well
> to quickly scan for exceptions. A problem were multiline stacktraces
> (which they usually all are). They got mixed up when multiple servers
> sent events at the same time. The Splunk data got all garbled then.
> But something like that yeah.
>
> Maybe with the new Multiput style stuff the WAL is not such a big
> overhead anymore?
>
> Lars
>
> On Tue, Apr 6, 2010 at 7:12 PM, Jonathan Gray <jgray@facebook.com> wrote:
>> I like this idea.
>>
>> Putting major cluster events in some form into ZK.  Could be used for jobs as Todd
says.  Can also be used as a cluster history report on web ui and such.  Higher level historian.
>>
>> I'm a fan of anything that moves us away from requiring parsing hundreds or thousands
of lines of logs to see what has happened.
>>
>> JG
>>
>>> -----Original Message-----
>>> From: Todd Lipcon [mailto:todd@cloudera.com]
>>> Sent: Tuesday, April 06, 2010 9:49 AM
>>> To: hbase-dev@hadoop.apache.org
>>> Subject: Re: Should HTable.put() return a Future?
>>>
>>> On Tue, Apr 6, 2010 at 9:46 AM, Jean-Daniel Cryans
>>> <jdcryans@apache.org>wrote:
>>>
>>> > Yes it is, you will be missing a RS ;)
>>> >
>>> >
>>> How do you detect this, though?
>>>
>>> It might be useful to add a counter in ZK for region server crashes. If
>>> the
>>> master ever notices that a RS goes down, it increments it. Then we can
>>> check
>>> the before/after for a job and know when we might have lost some data.
>>>
>>> -Todd
>>>
>>>
>>> > General rule when uploading without WAL is if there's a failure, the
>>> > job is screwed and that's the tradeoff for speed.
>>> >
>>> > J-D
>>> >
>>> > On Tue, Apr 6, 2010 at 9:36 AM, Todd Lipcon <todd@cloudera.com>
>>> wrote:
>>> > > On Tue, Apr 6, 2010 at 9:31 AM, Jean-Daniel Cryans
>>> <jdcryans@apache.org
>>> > >wrote:
>>> > >
>>> > >> The issue isn't with the write buffer here, it's the WAL. Your
>>> edits
>>> > >> are in the MemStore so as far as your clients can tell, the data
>>> is
>>> > >> all persisted. In this case you would need to know when all the
>>> > >> memstores that contain your data are flushed... Best practice when
>>> > >> turning off WAL is force flushing the tables after the job is
>>> done,
>>> > >> else you can't guarantee durability for the last edits.
>>> > >>
>>> > >>
>>> > > You still can't guarantee durability for any of the edits, since a
>>> > failure
>>> > > in the middle of your job is undetectable :)
>>> > >
>>> > > -Todd
>>> > >
>>> > >
>>> > >> J-D
>>> > >>
>>> > >> On Tue, Apr 6, 2010 at 4:02 AM, Lars George
>>> <lars.george@gmail.com>
>>> > wrote:
>>> > >> > Hi,
>>> > >> >
>>> > >> > I have an issue where I do bulk import and since WAL is off
and
>>> a
>>> > >> > default write buffer used (TableOutputFormat) I am running
into
>>> > >> > situations where the MR job completes successfully but not
all
>>> data is
>>> > >> > actually restored. The issue seems to be a failure on the
RS
>>> side as
>>> > >> > it cannot flush the write buffers because the MR overloads
the
>>> cluster
>>> > >> > (usually the .META: hosting RS is the breaking point) or causes
>>> the
>>> > >> > underlying DFS to go slow and that repercussions all the way
up
>>> to the
>>> > >> > RS's.
>>> > >> >
>>> > >> > My question is, would it make sense as with any other
>>> asynchronous IO
>>> > >> > to return a Future from the put() that will help checking
the
>>> status
>>> > >> > of the actual server side async flush operation? Or am I
>>> misguided
>>> > >> > here? Please advise.
>>> > >> >
>>> > >> > Lars
>>> > >> >
>>> > >>
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > Todd Lipcon
>>> > > Software Engineer, Cloudera
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>
>

Mime
View raw message