hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Shinde <ash...@strandls.com>
Subject Re: Bulk upload with multiple reducers with hbase-0.90.0
Date Fri, 21 Jan 2011 04:48:44 GMT
Hi Stack,

Yes makes sense. Will approach it from our needs perspective.

I tried using a prebaked table and a reasonable partioner with very
 promising results in terms of insert times. 

However the size of a 1.6 GB test file after import resulted in a hbase
folder roughly 6 GB. Although in most cases people are not disk size
sensitive, we would really like to keep disk usage at a minimum.  

The nature of the data required me to create a rowkey that was 100
bytes long. An examination of the table's datablock's revealed that
every column in the datablock is proceeded by the rowkey, and in our
case this results an overhead of 6 times. Am I doing something obviously

Serializing the row into a single hbase column brought the disk
usage under wraps. Another approach I tried was to club a number of
rows into a single hbase row and used a different indexing scheme with a
simple long rowkey. This provided the best performance and the used the
least amount of disk space. 

Our data is immutable at least as much as I can for see. Is the
serialized row the best option I have? Does the number of rows in a
table affect read performance. If this is the case then clubbing rows
seems the be a reasonable option. 

Thanks and regards,
 - Ashish

On Wed, 19 Jan 2011 22:16:33 -0800
Stack <stack@duboce.net> wrote:

> On Wed, Jan 19, 2011 at 9:50 PM, Ashish Shinde <ashish@strandls.com>
> wrote:
> > I have to say I am might impressed with hadoop and hbase, the
> > overall philosophy and the architecture and have decided to
> > contribute as much as time permits. Already looking at the "noob"
> > issues on hbase jira :)
> >
> I'd say work on your particular need rather than on noob issues.
> Thats probably the best contrib. you could make.  Figure out the
> blockers -- we'll help out -- that get in the way of your sizeable
> incremental bulk uploads.  Your use case makes for a good story.
> Good luck Ashish,
> St.Ack

View raw message