hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chad Walters (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2075) [hbase] Bulk load and dump tools
Date Sat, 08 Dec 2007 21:36:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549753

Chad Walters commented on HADOOP-2075:

I like the idea of lots of splits early on when the number of regions is less than the number
of region servers. You want to make sure the splits are made at points that relatively well-distributed,
of course, so don't make it so small that you split without a representative sampling. This
would be a good general purpose solution that doesn't create a new API. Then the bulk upload
simply looks like partitioning the data set and uploading via Map-Reduce, perhaps with batched
inserts. Do you really think this would be dog slow?

If that is not fast enough, I suppose we could have a mapfile uploader. This would require
the dataset to be prepared properly, which could be a bit fidgety (sorting, properly splitting
across columns, etc.).

> [hbase] Bulk load and dump tools
> --------------------------------
>                 Key: HADOOP-2075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2075
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via the current
APIs, particularly if the dataset is large and cell content is small, uploads can take a long
time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write regions directly
in hdfs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message