hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-748) Add an efficient way to batch update many rows
Date Wed, 17 Sep 2008 13:25:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631769#action_12631769

Jean-Daniel Cryans commented on HBASE-748:

I gave more thought to st^ack's idea of buffering the edits and I think it would be nice to
implement it. This is how I see it.

We keep an ArrayList of RowUpdates in HTable so that we have a cache per table. It should
be of a configurable maximum size in bytes. Maybe a default of 64M? It should also be configurable
when creating a HTable.

The RowUpdate class should be able to give us the size of all the BatchOperation it contains.
It should fairly easy to do by asking each BO their value's length.

We can compute the size of the RowUpdate either at commit time or we can do it after each
put. I would prefer after each put so we skip the iteration.

In the case of auto-flushing, I see two ways to detect that the buffer is full. Either at
commit time or in a separate thread like the Flusher currently works. The first is very easy
to implement but blocks the commits. The second is harder to implement but doesn't block the
commits. I think that for 0.19.0 we could implement the first one. 

The other case is that auto-flushing is disabled and then it is the user's responsibility
to call something like HTable.flushEdits().

Any comments?

> Add an efficient way to batch update many rows
> ----------------------------------------------
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is
to have an enhanced version that will send many rows in a single RPC to each region server.
To do this, the client code will have to figure which rows goes to which server, group them
accordingly and then send them.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message