hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Shinde <ash...@strandls.com>
Subject Bulk upload with multiple reducers with hbase-0.90.0
Date Wed, 19 Jan 2011 04:46:10 GMT
Hi,

I am new to hbase and to hadoop as well so forgive me if the following
is naive.

I am trying to bulk upload large amounts of data (billions of rows with
15-20 columns) into an empty hbase table using two column families.

The approach I tried was to use MR. The code is copied over and
modified from to ImportTsv.java.

I did not get good performance because the code used
TotalOrderPartioner which I gathered looked at the current number of
regions and decided to use a single reducer on an empty table. 

I then tried SimpleTotalOrderPartioner with conservatively large start
and end keys which then ended up dividing unequally over our 10 node
cluster.

Questions

1. Can bulk upload use totalorderpartioner with multiple reducers ?

2. I don't have a handle of the min and max row key from the data
unless I collect it over the MAP phase. Is it possible to reconfigure
the partioner after map phase is over ?

3. I would need to frequently load datasets with billions of rows
(450-800GB) to hbase as the solution is part of a data processing
pipeline. My estimate (optimistic) on a 10 node cluster is 7 hours . Is
this reasonable. Would hbase scale to say 100s of such datasets, giving
I can add disk spsace and nodes to the cluster.

Thanks,

 - Ashish


Mime
View raw message