hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-48) [hbase] Bulk load tools
Date Sun, 02 Aug 2009 02:50:14 GMT

     [ https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-48:

    Summary: [hbase] Bulk load tools  (was: [hbase] Bulk load and dump tools)

I deleted my last comment.  It was duplication of stuff said earlier in this issue a good
while ago.

I changed title of this issue to only be about bulk upload.   The bulk dump is going on elsewhere:
e.g. HBASE-1684

On earlier comments about splitting table so at least a region per regionserver, that ain't
hard to do now.  You can do it via UI -- force a split -- or just write a little script to
add a table and initial region range (for example, see the script in the attached patch).

I think criteria for closing this issue is commit of some set of tools that allow writing
hfiles either into new tables or into extant tables.

> [hbase] Bulk load tools
> -----------------------
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>         Attachments: 48-v2.patch, 48-v3.patch, 48-v4.patch, 48.patch, loadtable.rb
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via the current
APIs, particularly if the dataset is large and cell content is small, uploads can take a long
time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write regions directly
in hdfs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message