hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Shinde <ash...@strandls.com>
Subject Re: Bulk upload with multiple reducers with hbase-0.90.0
Date Fri, 21 Jan 2011 05:25:47 GMT
Yes I a picked out bits and pieces from ImportTsv.java

Thanks and regards,
 - Ashish

On Tue, 18 Jan 2011 23:14:47 -0800
Ted Dunning <tdunning@maprtech.com> wrote:

> Have you seen the bulk loader?
> 
> On Tue, Jan 18, 2011 at 8:46 PM, Ashish Shinde <ashish@strandls.com>
> wrote:
> 
> > Hi,
> >
> > I am new to hbase and to hadoop as well so forgive me if the
> > following is naive.
> >
> > I am trying to bulk upload large amounts of data (billions of rows
> > with 15-20 columns) into an empty hbase table using two column
> > families.
> >
> > The approach I tried was to use MR. The code is copied over and
> > modified from to ImportTsv.java.
> >
> > I did not get good performance because the code used
> > TotalOrderPartioner which I gathered looked at the current number of
> > regions and decided to use a single reducer on an empty table.
> >
> > I then tried SimpleTotalOrderPartioner with conservatively large
> > start and end keys which then ended up dividing unequally over our
> > 10 node cluster.
> >
> > Questions
> >
> > 1. Can bulk upload use totalorderpartioner with multiple reducers ?
> >
> > 2. I don't have a handle of the min and max row key from the data
> > unless I collect it over the MAP phase. Is it possible to
> > reconfigure the partioner after map phase is over ?
> >
> > 3. I would need to frequently load datasets with billions of rows
> > (450-800GB) to hbase as the solution is part of a data processing
> > pipeline. My estimate (optimistic) on a 10 node cluster is 7
> > hours . Is this reasonable. Would hbase scale to say 100s of such
> > datasets, giving I can add disk spsace and nodes to the cluster.
> >
> > Thanks,
> >
> >  - Ashish
> >
> >


Mime
View raw message