hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray" <jl...@streamy.com>
Subject Re: Should we change the default value of hbase.regionserver.flushlogentries for 0.21?
Date Wed, 18 Nov 2009 00:36:57 GMT
Thoughts on a client-facing call to explicit call a WAL sync?  So I could
turn on DEFERRED_LOG_FLUSH (possibly leave it on always), run a batch of
my inserts, and then run an explicit flush/sync.  The returning of that
call would guarantee to the client that the data up to that point is safe.

JG

On Mon, November 16, 2009 11:00 am, Jean-Daniel Cryans wrote:
> I added a new feature for tables called "deferred flush", see
> https://issues.apache.org/jira/browse/HBASE-1944
>
>
> My opinion is that the default should be paranoid enough to not lose
> any user data. If we can change a table's attribute without taking it down
> (there's a jira on that), wouldn't that solve the import problem?
>
>
> For example: have some table that needs to have fast insertion via MR.
> During the creation of the job, you change the table's
> DEFERRED_LOG_FLUSH to "true", then run the job and finally set the
> value to false when the job is done.
>
> This way you still pass the responsibility to the user but for
> performance reasons.
>
> J-D
>
>
> On Mon, Nov 16, 2009 at 2:05 AM, Cosmin Lehene <clehene@adobe.com> wrote:
>
>> We could have a speedy default and an extra parameter for puts that
>> would specify a flush is needed. This way you pass the responsibility to
>> the user and he can decide if he needs to be paranoid or not. This could
>> be part of Put and even specify granularity of the flush if needed.
>>
>>
>> Cosmin
>>
>>
>>
>> On 11/15/09 6:59 PM, "Andrew Purtell" <apurtell@apache.org> wrote:
>>
>>
>>> I agree with this.
>>>
>>>
>>> I also think we should leave the default as is with the caveat that
>>> we call out the durability versus write performance tradeoff in the
>>> flushlogentries description and up on the wiki somewhere, maybe on
>>> http://wiki.apache.org/hadoop/PerformanceTuning . We could also
>>> provide two example configurations, one for performance (reasonable
>>> tradeoffs), one for paranoia. I put up an issue:
>>> https://issues.apache.org/jira/browse/HBASE-1984
>>>
>>>
>>>     - Andy
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Ryan Rawson <ryanobjc@gmail.com>
>>> To: hbase-dev@hadoop.apache.org
>>> Sent: Sat, November 14, 2009 11:22:13 PM
>>> Subject: Re: Should we change the default value of
>>> hbase.regionserver.flushlogentries  for 0.21?
>>>
>>> That sync at the end of a RPC is my doing. You dont want to sync
>>> every _EDIT_, after all, the previous definition of the word "edit"
>>> was each KeyValue.  So we could be calling sync for every single
>>> column in a row. Bad stuff.
>>>
>>> In the end, if the regionserver crashes during a batch put, we will
>>> never know how much of the batch was flushed to the WAL. Thus it makes
>>>  sense to only do it once and get a massive, massive, speedup.
>>>
>>> On Sat, Nov 14, 2009 at 9:45 PM, stack <stack@duboce.net> wrote:
>>>
>>>> I'm for leaving it as it is, at every 100 edits -- maybe every 10
>>>> edits? Speed stays as it was.  We used to lose MBs.  By default,
>>>> we'll now lose 99 or 9 edits max.
>>>>
>>>> We need to do some work bringing folks along regardless of what we
>>>> decide. Flush happens at the end of the put up in the regionserver.
>>>>  If you are
>>>> doing a batch of commits -- e.g. using a big write buffer over on
>>>> your client -- the puts will only be flushed on the way out after
>>>> the batch put completes EVEN if you have configured hbase to sync
>>>> every edit (I ran into this this evening.  J-D sorted me out).  We
>>>> need to make sure folks are up on this.
>>>>
>>>> St.Ack
>>>>
>>>>
>>>>
>>>>
>>>> On Sat, Nov 14, 2009 at 4:37 PM, Jean-Daniel Cryans
>>>> <jdcryans@apache.org>wrote:
>>>>
>>>>
>>>>> Hi dev!
>>>>>
>>>>>
>>>>> Hadoop 0.21 now has a reliable append and flush feature and this
>>>>> gives us the opportunity to review some assumptions. The current
>>>>> situation:
>>>>>
>>>>>
>>>>> - Every edit going to a catalog table is flushed so there's no
>>>>> data loss. - The user tables edits are flushed every
>>>>> hbase.regionserver.flushlogentries which by default is 100.
>>>>>
>>>>> Should we now set this value to 1 in order to have more durable
>>>>> but slower inserts by default? Please speak up.
>>>>>
>>>>> Thx,
>>>>>
>>>>>
>>>>> J-D
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
>


Mime
View raw message