hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-748) Add an efficient way to batch update many rows
Date Thu, 25 Sep 2008 00:03:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634348#action_12634348

stack commented on HBASE-748:

On the patch:

I see you've made BatchUpdate comparable already.  Scratch my suggestion above.  I like the
fact that you can ask a BatchUpdate its size.  Should size include row and column sizes too
to cover the case where they are really large?

bq. The region gets split after 10 rows... (From 'Jean-Daniel Cryans - 24/Sep/08 03:59 PM
- edited')

I'd been thinking that doing batch updates, we'd take out the splitsAndClosesLock so splits
wouldn't happen.  Would this be easier if you moved the array of batch update handlings down
into HRegion rather than run it in HRS?

Minor nitpick: Put declaration and this line -- '+    this.writeBuffer = new ArrayList<BatchUpdate>();'
-- together and then add final keywork (not important... but if you are revising anyways).
 Similarily for this line -- +    this.currentWriteBufferSize = 0;... why not declare and
do initial assign in the one line rather than have it split with assignment down in constructor?

More nitpicks:

+      if (bu.getRow() == null)
+        throw new IllegalArgumentException("update has null row");

In hadoop coding, folks supply the parens as in:

+      if (bu.getRow() == null) {
+        throw new IllegalArgumentException("update has null row");
+     }

Just FYI.

Here's a style comment:  Rather than do this in fflushCommits:

+    if (!writeBuffer.isEmpty()) {

... instead do

+    if (writeBuffer.isEmpty()) {
+      return;
+   }

Then you don't have whole method body indented.  Gives you more room on a line.  Deals with
the isEmpty immediately rather than let it last the length of the method body.

bq. 1- The way retries are handled client-side is ugly, cannot be reused for other operations
(so HBASE-880 wouldn't fit in).

I don't follow the above J-D?  Are you saying that we can't do batching with 880?

bq. 3- WRE and NSRE maybe should share a common super class.

Thats fine by me.

bq. 4- If any exception is thrown server-side, like values are too long or columns in the
wrong format, the state of the whole transaction will be unknown to the user.

Could we iterate the BatchUpdate array first before committing to check its wholesomeness
so if problem, user knows that nothing was committed?

bq. ...but it's not atomic if a row fails, for example, if a value was too long.

What you mean by the above?  That if we have none-WRE or none-NSRE-like exception, then batch
is dropped?

Good stuff J-D.

> Add an efficient way to batch update many rows
> ----------------------------------------------
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>         Attachments: hbase-748-v1.patch
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is
to have an enhanced version that will send many rows in a single RPC to each region server.
To do this, the client code will have to figure which rows goes to which server, group them
accordingly and then send them.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message