hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Corgan <mcor...@hotpads.com>
Subject Re: HBase cluster with heterogeneous resources
Date Sat, 16 Oct 2010 18:27:18 GMT
I could be wrong, but I don't think there's any performance benefit to
having a small hdfs block size.  If you are doing a random read fetching a
1KB cell out of an HFile, it will not pull the entire 64MB hdfs block from
hdfs, it plucks only the small section of the hdfs file/block that contains
the HFile index and then the appropriate 64KB hbase block.  Maybe someone
more knowledgeable could elaborate on the exact number and size of hdfs
accesses.


On Sat, Oct 16, 2010 at 2:10 PM, Abhijit Pol <apol@rocketfuel.com> wrote:

> >
> >
> > If this is your setup, your HDFS' namenode is bound to OOM soon.
> > (Namenode's
> > memory consumption is proportional to the number of blocks on HDFS)
> >
> >
> NN runs on master and we have 4GB for NN and that is good for long time
> given amount of blocks we have. DN has 1GB, TT 512MB and JT 1GB.
>
>
>
> > I guess you meant "hfile.min.blocksize.size" in ? That is a different
> > parameter from HDFS' block size, IMO. (need someone to confirm)
> >
> >
> yes, HBase and HDFS blocks are two different params. We are testing with
> 8k HBASE (default 64KB) and 64k HDFS (default 64MB) blocks sizes. Both
> these
> are much smaller than defaults, but we have random read heavy work load and
> smaller blocks should help, given smaller sizes are not exposing some other
> bottleneck.
>
> HBASE smaller blocks means larger indices and better random read
> performance. So make sense to trade some RAM for block index as we have
> plenty RAM on our machines.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message