hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: HBase random access in HDFS and block indices
Date Mon, 01 Nov 2010 16:28:25 GMT



> Date: Fri, 29 Oct 2010 10:01:24 -0700
> Subject: Re: HBase random access in HDFS and block indices
> From: stack@duboce.net
> To: user@hbase.apache.org
> 
> On Fri, Oct 29, 2010 at 6:41 AM, Sean Bigdatafun
> <sean.bigdatafun@gmail.com> wrote:
> > I have the same doubt here. Let's say I have a totally random read pattern
> > (uniformly distributed).
> >
> > Now let's assume my total data size stored in HBase is 100TB on 10
> > machines(not a big deal considering nowaday's disks), and the total size of
> > my RS' memory is 10 * 6G = 60 GB. That translate into a 60/100*1000 = 0.06%
> > cache hit probablity. Under random read pattern, each read is bound to
> > experience the "open-> read index -> .... -> read datablock" sequence,
which
> > would be expensive.
> >
> > Any comment?
> >
> 
> If totally random, as per Alvin's suggestion, yes, just turn off block
> caching since it is doing you no good.
> 
> But totally random is unusual in practise, no?
> 
> St.Ack

Uhm... not exactly.

One of the benefits of HBase is that it should scale in a *near* linear fashion.

So if we don't know how the data is to be accessed, or we know that there are a couple of
access patterns that are orthogonal to each other, putting the data in to the cloud in a 'random'
fashion should provide consistent read access times.

So the design of 'random' stored data shouldn't be that unusual. It just means you're going
to have a couple of different indexes. ;-)



 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message