hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Webster (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10003) OnlineMerge should be extended to allow bulk merging
Date Thu, 09 Jan 2014 05:19:50 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866327#comment-13866327
] 

Michael Webster commented on HBASE-10003:
-----------------------------------------

I have a thought on how to do this, although I am unfamiliar with the merge internals.  It
seems like you could just start merging the first and second regions, then merge region 3
if the combined region sizes are below the value of the '-max' parameter.  Once adding a region
would put the new region size over '-max', start the process again.
Basically the rule would look like this:
merge(R1,R2) into R1`  
  If the R1`.size + R3.size < MAX
     merge(R1`,R3)

I guess this could also be done asynchronously.  Instead of immediately merging the regions,
add them to a "todo" list, once you hit the size limit, you send the todo list off to an executor
to do the recursive merge.  Being unfamiliar with merge internals though, I don't know if
asynchronous merges can/should be done.  I can imagine that causing some issues with holes
in the region chain in .META.

Any feedback is welcome.

> OnlineMerge should be extended to allow bulk merging
> ----------------------------------------------------
>
>                 Key: HBASE-10003
>                 URL: https://issues.apache.org/jira/browse/HBASE-10003
>             Project: HBase
>          Issue Type: Improvement
>          Components: Admin, Usability
>    Affects Versions: 0.98.0, 0.94.6
>            Reporter: Clint Heath
>            Priority: Critical
>              Labels: noob
>
> Now that we have Online Merge capabilities, the function of that tool should be extended
to make it much easier for HBase operations folks to use.  Currently it is a very manual process
(one fraught with confusion) to hand pick two regions that are contiguous to each other in
the META table such that the admin can manually request those two regions to be merged.
> In the real world, when admins find themselves wanting to merge regions, it's usually
because they've greatly increased their hbase.hregion.max.filesize property and they have
way too many regions on a table and want to reduce the region count for that entire table
quickly and easily.
> Why can't the OnlineMerge command just take a "-max" argument along with a table name
which tells it to go ahead and merge all regions of said table until the resulting regions
are all of max size?  This takes the voodoo out of the process and quickly gets the admin
what they're looking for.
> As part of this improvement, I also suggest a "-regioncount" argument for OnlineMerge,
which will attempt to reduce the table's region count down to the specified #.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message