hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Francke (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1861) Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
Date Fri, 06 Aug 2010 01:00:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895902#action_12895902

Lars Francke commented on HBASE-1861:

I have taken a stab at it. This is what I did:

* Currently once it is decided that a HFile becomes too large it is closed and a new one is
open. This doesn't work anymore because there may still be KeyValues for the current row in
other column families coming. So now I just set a flag that a HFile rotation is needed. On
every write this flag is tested and when it is true and the row key changes I close all currently
open HFiles
** This gets slightly more complicated due to the fact that we only _close_ the HFiles but
don't open new ones here because they may not be needed. So a check is still required on every
write if we need to open a new HFile
* As we later need to know which files belong together to a region I save them using the current
task attempt id and a counter to guarantee their uniqueness 

The current tests all run with my changes which is a good sign.

The second part is the loading of those files which seems to be more complicated and which
could use some comments. HBASE-1923 recently made this more complicated and I'm not sure I
fully understand. Basically these are the changes required:

* To create a new region we now have to look for the start- and endkey in all column families
* We have to load all the column families HFiles for a single region, those might be different
between regions

To make both steps easier I could write an additional metadata file during HFileOutputFormat
which contains the start- and endkeys as well as all the column families that have HFiles
for this region. This data is available during creation.

So any input on how this would affect/be affected by the incremental stuff would be appreciated.

> Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
> -----------------------------------------------------------------------------
>                 Key: HBASE-1861
>                 URL: https://issues.apache.org/jira/browse/HBASE-1861
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
> Add multi-family support to bulk upload tools from HBASE-48.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message