hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Gupta <pankaj.ro...@gmail.com>
Subject Does HBase RegionServer benefit from OS Page Cache
Date Wed, 20 Mar 2013 07:34:19 GMT
Given that HBase has it's own cache (block cache and bloom filters) and that all the table
data is stored in HDFS, I'm wondering if HBase benefits from OS page cache at all. In the
set up I'm using HBase Region Servers run on the same boxes as the HDFS data node. In such
a scenario if the underlying HLog files lives on the same machine then having a healthy memory
surplus may mean that the data node can serve underlying file from page cache and thus improving
HBase performance. Is this really the case? (I guess page cache should also help in case where
HLog file lives on a different machine but in that case network I/O will probably drown the
speedup achieved due to not hitting the disk.

I'm asking because if page cache were useful then in an HBase set up not utilizing all the
memory on the machine for the region server may not be that bad. The reason one would not
want to use all the memory for region server would be long garbage collection pauses that
large heap size may induce. I understand that work has been done to fix the long pauses caused
due to memory fragmentation in the old generation, mostly concurrent garbage collector by
using slab cache allocator for memstore but that feature is marked experimental and we're
not ready to take risks yet. So if the page cache was useful in any way on Region Servers
we could go with less memory for RegionServer process with the understanding that free memory
on the machine is not completely going to waste. Thus my curiosity about utility of os page
cache to performance of HBase.

Thanks in Advance,
View raw message