hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Shinde <ash...@strandls.com>
Subject Bulk upload with multiple reducers with hbase-0.90.0
Date Wed, 19 Jan 2011 04:46:10 GMT

I am new to hbase and to hadoop as well so forgive me if the following
is naive.

I am trying to bulk upload large amounts of data (billions of rows with
15-20 columns) into an empty hbase table using two column families.

The approach I tried was to use MR. The code is copied over and
modified from to ImportTsv.java.

I did not get good performance because the code used
TotalOrderPartioner which I gathered looked at the current number of
regions and decided to use a single reducer on an empty table. 

I then tried SimpleTotalOrderPartioner with conservatively large start
and end keys which then ended up dividing unequally over our 10 node


1. Can bulk upload use totalorderpartioner with multiple reducers ?

2. I don't have a handle of the min and max row key from the data
unless I collect it over the MAP phase. Is it possible to reconfigure
the partioner after map phase is over ?

3. I would need to frequently load datasets with billions of rows
(450-800GB) to hbase as the solution is part of a data processing
pipeline. My estimate (optimistic) on a 10 node cluster is 7 hours . Is
this reasonable. Would hbase scale to say 100s of such datasets, giving
I can add disk spsace and nodes to the cluster.


 - Ashish

View raw message