hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?
Date Mon, 16 Nov 2009 17:32:34 GMT
On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <clehene@adobe.com> wrote:

> We could have a speedy default and an extra parameter for puts that would
> specify a flush is needed. This way you pass the responsibility to the user
> and he can decide if he needs to be paranoid or not. This could be part of
> Put and even specify granularity of the flush if needed.
>
> I like this idea.
St.Ack



> Cosmin
>
>
> On 11/15/09 6:59 PM, "Andrew Purtell" <apurtell@apache.org> wrote:
>
> > I agree with this.
> >
> > I also think we should leave the default as is with the caveat that we
> call
> > out the durability versus write performance tradeoff in the
> flushlogentries
> > description and up on the wiki somewhere, maybe on
> > http://wiki.apache.org/hadoop/PerformanceTuning . We could also provide
> two
> > example configurations, one for performance (reasonable tradeoffs), one
> for
> > paranoia. I put up an issue:
> https://issues.apache.org/jira/browse/HBASE-1984
> >
> >     - Andy
> >
> >
> >
> >
> > ________________________________
> > From: Ryan Rawson <ryanobjc@gmail.com>
> > To: hbase-dev@hadoop.apache.org
> > Sent: Sat, November 14, 2009 11:22:13 PM
> > Subject: Re: Should we change the default value of
> > hbase.regionserver.flushlogentries  for 0.21?
> >
> > That sync at the end of a RPC is my doing. You dont want to sync every
> > _EDIT_, after all, the previous definition of the word "edit" was each
> > KeyValue.  So we could be calling sync for every single column in a
> > row. Bad stuff.
> >
> > In the end, if the regionserver crashes during a batch put, we will
> > never know how much of the batch was flushed to the WAL. Thus it makes
> > sense to only do it once and get a massive, massive, speedup.
> >
> > On Sat, Nov 14, 2009 at 9:45 PM, stack <stack@duboce.net> wrote:
> >> I'm for leaving it as it is, at every 100 edits -- maybe every 10 edits?
> >> Speed stays as it was.  We used to lose MBs.  By default, we'll now lose
> 99
> >> or 9 edits max.
> >>
> >> We need to do some work bringing folks along regardless of what we
> decide.
> >> Flush happens at the end of the put up in the regionserver.  If you are
> >> doing a batch of commits -- e.g. using a big write buffer over on your
> >> client -- the puts will only be flushed on the way out after the batch
> put
> >> completes EVEN if you have configured hbase to sync every edit (I ran
> into
> >> this this evening.  J-D sorted me out).  We need to make sure folks are
> up
> >> on this.
> >>
> >> St.Ack
> >>
> >>
> >>
> >> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
> >> <jdcryans@apache.org>wrote:
> >>
> >>> Hi dev!
> >>>
> >>> Hadoop 0.21 now has a reliable append and flush feature and this gives
> >>> us the opportunity to review some assumptions. The current situation:
> >>>
> >>> - Every edit going to a catalog table is flushed so there's no data
> loss.
> >>> - The user tables edits are flushed every
> >>> hbase.regionserver.flushlogentries which by default is 100.
> >>>
> >>> Should we now set this value to 1 in order to have more durable but
> >>> slower inserts by default? Please speak up.
> >>>
> >>> Thx,
> >>>
> >>> J-D
> >>>
> >>
> >
> >
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message