hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-48) [hbase] Bulk load tools
Date Wed, 23 Sep 2009 17:14:15 GMT

    [ https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12758784#action_12758784
] 

Jonathan Gray commented on HBASE-48:
------------------------------------

v6 patch works as advertised.  

Just imported 200k rows with avg of 500 columns each for 100M total KVs (24 regions).  MR
job ran in under 2 minutes into a 4 node cluster of 2core/2gb/250gb nodes.  Ruby script takes
3-4 seconds and then about 30 seconds for cluster to assign out regions.

I'm ready to commit this to trunk and branch though we need some docs.  Will open separate
JIRA for multi-family support.

> [hbase] Bulk load tools
> -----------------------
>
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>         Attachments: 48-v2.patch, 48-v3.patch, 48-v4.patch, 48-v5.patch, 48.patch, HBASE-48-v6-branch.patch,
loadtable.rb
>
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via the current
APIs, particularly if the dataset is large and cell content is small, uploads can take a long
time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write regions directly
in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message