hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-748) Add an efficient way to batch update many rows
Date Fri, 19 Sep 2008 15:30:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632727#action_12632727
] 

Jean-Daniel Cryans commented on HBASE-748:
------------------------------------------

Here is how I plan to implement the "many rows to many regions" logic.

In HRS, add a new version of batchUpdate that takes an array of RowUpdate (HBASE-880). For
this version, it will only iterate over the array and call the current batchUpdate. A bit
of logic will be added so that if an WRE gets thrown, we return what was the index of the
last inserted row.

In HTable, when the flushing is called, it calls a method that takes an ArrayList of unsorted
RowOperation (HBASE-880). Following pseudo code does the rest:

{code}
sort the row operations (called ops)
create a temporary empty list of ops
retrieve the cached region of the first op and mark it as "current"
for i = 0; i < number of ops; i++
  current op is at index i of the array of ops
  add the op to the temporary list
  retrieve the cached region of the following op (if any)
  if current region not equals retrieved region or current op is the last one
    do the operation on region server of current region
    if an WRE is thrown
      retrieve the real region of the op at the index in WRE (becomes the retrieved region)
      reset i to the index of the returned row - 1 in WRE
    the retrieved region is now the current region
    clear the temporary list
{code}

The big trade-off in this algo is that I try to limit the number of queries to .META. by using
the cache at the expense of moving potentially big chunks of rows back an forth if the cache
is stale. This impact could be diminished if we fetched more .META. rows at each locateRegionInMeta
using HBASE-887 instead of using getClosestRowBefore (just a thought). That's what Bigtable
does.

Any comments?

> Add an efficient way to batch update many rows
> ----------------------------------------------
>
>                 Key: HBASE-748
>                 URL: https://issues.apache.org/jira/browse/HBASE-748
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: client
>    Affects Versions: 0.1.3, 0.2.0
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.19.0
>
>
> HBASE-747 introduced a simple way to batch update many rows. The goal of this issue is
to have an enhanced version that will send many rows in a single RPC to each region server.
To do this, the client code will have to figure which rows goes to which server, group them
accordingly and then send them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message