hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-748) Add an efficient way to batch update many rows
Date Thu, 25 Sep 2008 22:17:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634658#action_12634658

Jim Kellerman commented on HBASE-748:

Jean-Daniel Cryans - 24/Sep/08 03:59 PM
HTable commits 23 rows to HRS against a region. Let's say that the the first one in the 23
is the 1000th in the whole batch to commit.
The region gets split after 10 rows.
At row 11, HRS will handle a NSRE.
HRS returns index 10
Back in client, the current index in the batch was at 23.
It receives 10 from HRS so it backs the index to the row that failed (index = 1010).
Client refreshes cache for that row.
Process resumes at that index eg. rows from 1010 to 1022 will be retried using a fresh location.

Ok, now I get it. I missed that part. Sorry for being dense.

This actually works really well but it's not atomic if a row fails, for example, if a value
was too long.

Well, aside from the transactional region server, I would not expect it to be atomic across
Were you thinking that there may be multiple BatchUpdates for the same row? Not the best way
for a client to behave in my opinion.

A couple of comments though.
- HTable.flushCommits() seems to ignore the row lock that can be passed to HTable.commit(BatchUpdate,
- Should the RowLock be associated with the BatchUpdate rather than being supplied on commit?
That would allow us to remove one commit overload, and allow the client to associate the row
lock with multiple BatchUpdates for the same row.

+1 on moving checks into commit (or flushCommits). We still fail early, although not as early
as we would if the checks were done in BatchUpdate. But as Stack points out, having BatchUpdate
require a HTable or HTD would be ugly. At least the request won't be partially processed before

Last comment on patch. Remove code that is commented out in HTable.commit(BatchUpdate, RowLock)

> Add an efficient way to batch update many rows
> ----------------------------------------------
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>         Attachments: hbase-748-v1.patch
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is
to have an enhanced version that will send many rows in a single RPC to each region server.
To do this, the client code will have to figure which rows goes to which server, group them
accordingly and then send them.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message