hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Francke (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1861) Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
Date Fri, 06 Aug 2010 16:50:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12896082#action_12896082
] 

Lars Francke commented on HBASE-1861:
-------------------------------------

Okay after having talked with Lars George and looking over the incremental load stuff from
Todd I've got even more questions.
It seems as if - and we should really document this somewhere - there are now two distinct
ways to bulk load stuff into HBase:

loadtable.rb creates regions manually and just creates the metadata to be picked up by the
metascanner. This seems like it is not very resource intensive (after the HFiles have been
generated).

And then there's the new completebulkload tool which shifts some of the load to HBase itself
by (and please correct me if I understood this wrong) possibly splitting a lot of the existing
regions and basically depending on HBase to put HFiles in appropriate places. This is a great
solution for incremental loads as regions already exist. But is this a good solution performance/load
wise for an empty table? My knowledge of HBase in this regard is still limited but I would
have thought that the constant splitting would be pretty bad especially when starting with
an empty table with no regions.

I'd love your input on how to solve this: multi column families only for empty tables supported
by loadtable.rb or only for the incremental bulk load tool or for both?
This also includes the question if we should keep loadtable.rb if it is a better fit for "cold
imports".

> Multi-Family support for bulk upload tools (HFileOutputFormat / loadtable.rb)
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-1861
>                 URL: https://issues.apache.org/jira/browse/HBASE-1861
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.90.0
>
>
> Add multi-family support to bulk upload tools from HBASE-48.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message