hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-48) [hbase] Bulk load tools
Date Tue, 18 Aug 2009 22:36:14 GMT

     [ https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HBASE-48:

    Attachment: 48-v5.patch

Updated patch.  Rolls hfile now at row boundary.

Regards TotalOrderPartitioner, there is no such facility in the new mapreduce package.   That
said, shouldn't be too hard making a partitioner of our own.  Here is the default hash partitioner:

  /** Use {@link Object#hashCode()} to partition. */
  public int getPartition(K key, V value,
                          int numReduceTasks) {
    return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;

We could take as inputs a start and end key and then divide the key space using our key bigdecimal
math into numReduceTasks partitions?

> [hbase] Bulk load tools
> -----------------------
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>         Attachments: 48-v2.patch, 48-v3.patch, 48-v4.patch, 48-v5.patch, 48.patch, loadtable.rb
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via the current
APIs, particularly if the dataset is large and cell content is small, uploads can take a long
time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write regions directly
in hdfs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message