hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-48) [hbase] Bulk load and dump tools
Date Sun, 02 Aug 2009 01:04:14 GMT

     [ https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-48:

    Attachment: 48-v4.patch

This patch seems to basically work.  I took files made by the TestHFileInputFormat test and
passed them to the script as follows:

$  ./bin/hbase org.jruby.Main bin/loadtable.rb xyz /tmp/testWritingPEData/

The script expects hbase to be running.

It ran through the list of hfiles, read their meta info and last key.  It then sorted the
hfiles by end key.  It makes a HTableDescriptor and HColumnDescriptor with defaults (If want
other than defaults, then after upload, alter table).  It then takes the sorted files and
per file moves it into place and adds a row to .META.  Doesn't take long.

The meta scanner runs after the upload and deploys the regions.


I'll not work on this anymore, not till someone else wants to try it.

> [hbase] Bulk load and dump tools
> --------------------------------
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>         Attachments: 48-v2.patch, 48-v3.patch, 48-v4.patch, 48.patch, loadtable.rb
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via the current
APIs, particularly if the dataset is large and cell content is small, uploads can take a long
time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write regions directly
in hdfs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message