hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: question about parallel get()
Date Sun, 17 May 2009 19:58:00 GMT
On Sun, May 17, 2009 at 11:19 AM, Yair Even-Zohar <yaire@audiencescience.com
> wrote:

> I'd like to run an efficient table get() methods and retrieve about a
> 1000 rows where each row includes about 4 columns (around 20 bytes per
> cell) with several versions per column. I assume the longest wait is for
> reading the row from the disk so I could parallelize these reads. Any
> suggestions what would be the best method?

0.19.x hbase or TRUNK?

> 1)       How many gets() should I be running in parallel?

Depends on how many disks and distribution of gets over nodes in the

> 2)       What's the best number of get() per region?

How many column families?  All in one column family?

> 3)       Should the row ids be randomized among the different regions?
Its best, yes, to distribute your get load over the cluster if you can.

Sorry for all the 'depends' and answering-questions with questions.  Its my
culture (smile).


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message