hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2075) [hbase] Bulk load and dump tools
Date Tue, 06 Nov 2007 21:41:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12540582
] 

stack commented on HADOOP-2075:
-------------------------------

Bulk uploader needs to be able to tolerate myriad data input types.  Data will likely need
massaging and ultimately, if writing HRegion content directly into HDFS rather than going
against hbase API -- preferred since it'll be dog slow doing bulk uploads going against hbase
API -- then it has to be sorted.  Using mapreduce would make sense.

Look too at using PIG because it has a few LOAD implementations -- from files on local or
HDFS -- and some facility for doing transforms on data moving tuples around.  Would need to
write a special STORE operator that wrote the data sorted out as HRegions direct into HDFS
(This would be different than PIG-6 which is about writing into hbase via API).

Also, chatting with Jim, this is a pretty important issue.  This is the first folks run into
when they start to get serious about hbase.

> [hbase] Bulk load and dump tools
> --------------------------------
>
>                 Key: HADOOP-2075
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2075
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: stack
>            Priority: Minor
>
> Hbase needs tools to facilitate bulk upload and possibly dumping.  Going via the current
APIs, particularly if the dataset is large and cell content is small, uploads can take a long
time even when using many concurrent clients.
> PNUTS folks talked of need for a different API to manage bulk upload/dump.
> Another notion would be to somehow have the bulk loader tools somehow write regions directly
in hdfs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message