hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2353) HBASE-2283 removed bulk sync optimization for multi-row puts
Date Wed, 24 Mar 2010 00:54:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848987#action_12848987

ryan rawson commented on HBASE-2353:

I have new numbers, basically my bulk puts are now much slower than previously.  This is a
killer for us.  Single thread import performance is now down to 2000-6000 rows/sec, down from

The first fix to this is to bring back deferred log flush.  I have a forthcoming patch. 

Here are my arguments:

- There is no multi-row atomicity guarantee. Having other clients see the partial results
of your batch put is acceptable because that is our consistency model - per row. That is the
defacto situation right now anyways.
- If the call succeeds, then we expect the puts to be durable.  By ensuring syncFs() call
returns before returning to the client we have this.
- Partial failure by exception leaves the HLog in an uncertain state.  The client will not
know how many rows were successfully made durable, and thus would be required to redo the
- Partial "failure" by return code means only part of the rows were made durable and available
to other clients.  This is normal and covered by the above cases I think.

Given this, what makes the most sense?  It seems like hlog.append() then syncFs() of all the
puts, THEN memstore mutate is the way to go. In HRS.put our protection from 'over memory'
is this call :


which will synchronously flush until we arent going to go over memory.  If we somehow fail
to add to memstore, it would be OOME which would kill the RS anyways.  Considering the data
for the Put is already in memory and we are just adjusting data structure nodes, it seems
unlikely that we'd be in this case often/ever.

> HBASE-2283 removed bulk sync optimization for multi-row puts
> ------------------------------------------------------------
>                 Key: HBASE-2353
>                 URL: https://issues.apache.org/jira/browse/HBASE-2353
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>             Fix For: 0.21.0
>         Attachments: HBASE-2353-deferred.txt
> previously to HBASE-2283 we used to call flush/sync once per put(Put[]) call (ie: batch
of commits).  Now we do for every row.  
> This makes bulk uploads slower if you are using WAL.  Is there an acceptable solution
to achieve both safety and performance by bulk-sync'ing puts?  Or would this not work in face
of atomic guarantees?
> discuss!

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message