hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-48) [hbase] Bulk load and dump tools
Date Wed, 06 Feb 2008 21:08:07 GMT

    [ https://issues.apache.org/jira/browse/HBASE-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12566338#action_12566338
] 

Billy Pearson commented on HBASE-48:
------------------------------------

Would not the best way to do this would be to do a map that formats and sorts the data per
column family then a reduce that writes a mapfiles directly to the regions columns?

Then that would skip the api and speed up the loading of the data and it would not matter
so much if we has 1 region or not sense all we would be doing is adding a mapfile to hdfs.
Course the map would have to know if there is 1 region or 1000 and split the data correctly
but even if each map 
only produces a few lines of data per column family the compactor will come along sooner or
later and clean up and split where needed.

So if we add 100 map files to one column I would assume that it would slow reads down a little
bit havening to sort threw all the map files while scanning but that would be a temporary
speed problem.


> [hbase] Bulk load and dump tools
> --------------------------------
>
>                 Key: HBASE-48
>                 URL: https://issues.apache.org/jira/browse/HBASE-48
>             Project: Hadoop HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Minor
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via the current
APIs, particularly if the dataset is large and cell content is small, uploads can take a long
time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write regions directly
in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message