hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: question about parallel get()
Date Sun, 17 May 2009 19:58:00 GMT
On Sun, May 17, 2009 at 11:19 AM, Yair Even-Zohar <yaire@audiencescience.com
> wrote:

> I'd like to run an efficient table get() methods and retrieve about a
> 1000 rows where each row includes about 4 columns (around 20 bytes per
> cell) with several versions per column. I assume the longest wait is for
> reading the row from the disk so I could parallelize these reads. Any
> suggestions what would be the best method?
>
>

0.19.x hbase or TRUNK?



>
>
> 1)       How many gets() should I be running in parallel?
>


Depends on how many disks and distribution of gets over nodes in the
cluster.



>
> 2)       What's the best number of get() per region?
>


How many column families?  All in one column family?



>
> 3)       Should the row ids be randomized among the different regions?
>
>
Its best, yes, to distribute your get load over the cluster if you can.

Sorry for all the 'depends' and answering-questions with questions.  Its my
culture (smile).

St.Ack

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message