hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: question about parallel get()
Date Tue, 19 May 2009 05:31:30 GMT
On Sun, May 17, 2009 at 10:53 PM, Yair Even-Zohar <yaire@audiencescience.com
> wrote:

> 1) EC2, medium server


OK.


>
> 2) 3 or 4 column families. From thousands to millions of columns
>


3 or 4 column families should be fine.  Are your doing a full row get or are
you getting individual columns on each fetch (Latter is faster).

Thousands to millions of columns per row will give you trouble in 0.19.x
hbase: https://issues.apache.org/jira/browse/HBASE-867.  HBase will run
slow.  Hopefully addressed in 0.20.0 hbase.


St.Ack



>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> stack
> Sent: Sunday, May 17, 2009 10:58 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: question about parallel get()
>
> On Sun, May 17, 2009 at 11:19 AM, Yair Even-Zohar
> <yaire@audiencescience.com
> > wrote:
>
> > I'd like to run an efficient table get() methods and retrieve about a
> > 1000 rows where each row includes about 4 columns (around 20 bytes per
> > cell) with several versions per column. I assume the longest wait is
> for
> > reading the row from the disk so I could parallelize these reads. Any
> > suggestions what would be the best method?
> >
> >
>
> 0.19.x hbase or TRUNK?
>
>
>
> >
> >
> > 1)       How many gets() should I be running in parallel?
> >
>
>
> Depends on how many disks and distribution of gets over nodes in the
> cluster.
>
>
>
> >
> > 2)       What's the best number of get() per region?
> >
>
>
> How many column families?  All in one column family?
>
>
>
> >
> > 3)       Should the row ids be randomized among the different regions?
> >
> >
> Its best, yes, to distribute your get load over the cluster if you can.
>
> Sorry for all the 'depends' and answering-questions with questions.  Its
> my
> culture (smile).
>
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message