hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (HBASE-748) Add an efficient way to batch update many rows
Date Wed, 24 Sep 2008 23:01:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634334#action_12634334
] 

jdcryans edited comment on HBASE-748 at 9/24/08 4:00 PM:
-------------------------------------------------------------------

bq. Shouldn't HRS.batchUpdate(final byte[] regionName, BatchUpdate[] b) return "i" if it falls
out of the try/catch block?

That would return the size of the array which we can compare back in the client. Good idea.

{quote}
I do not understand how these changes implement retries since getRegionServerForManyRows does
not implement them nor does it call getRegionServerWithRetries which does.
{quote}

Like I said in my sept 23 comment, this part is ugly and needs more work. It implements retries
in the way that it retries rows that didn't get processed. For example :

HTable commits 23 rows to HRS against a region. Let's say that the the first one in the 23
is the 1000th in the whole batch to commit.
The region gets split after 10 rows.
At row 11, HRS will handle a NSRE.
HRS returns index 10
Back in client, the current index in the batch was at 23.
It receives 10 from HRS so it backs the index to the row that failed (index = 1010).
Client refreshes cache for that row.
Process resumes at that index eg. rows from 1010 to 1022 will be retried using a fresh location.

This actually works really well but it's not atomic if a row fails, for example, if a value
was too long.

      was (Author: jdcryans):
    bq. Shouldn't HRS.batchUpdate(final byte[] regionName, BatchUpdate[] b) return "i" if
it falls out of the try/catch block?

That would return the size of the array which we can compare back in the client. Good idea.

{quote}
I do not understand how these changes implement retries since getRegionServerForManyRows does
not implement them nor does it call getRegionServerWithRetries which does.
{quote}

Like I said in my sept 23 comment, this part is ugly and needs more work. It implements retries
in the way that it retries rows that didn't get processed. For example :

HTable commits 23 rows to HRS against a region. Let's say that the the first one in the 23
is the 1000th in the whole batch to commit.
The region gets split after 10 rows.
At row 11, HRS will handle a NSRE.
HRS returns index 10
Back in client, the current index in the batch was at 23.
It receives 10 from HRS so it backs the index to the row that failed (index = 1010).
Client refreshes cache for that row.
Process resumes at that index eg. rows from 1010 to 1022 will be retried using a fresh location.

This actually works really well but it's not atomic...
  
> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>         Attachments: hbase-748-v1.patch
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is
to have an enhanced version that will send many rows in a single RPC to each region server.
To do this, the client code will have to figure which rows goes to which server, group them
accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message