hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16287) BlockCache size should not exceed acceptableSize too many
Date Wed, 27 Jul 2016 08:33:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395233#comment-15395233

Yu Sun commented on HBASE-16287:

Why -1g? We calc the BC size by conf xmx value * BC percentage.

under this jvm configuation:-Xmn4g -XX:SurvriorRatio=2, survrior size will be 4g/(2+1+1)=1g,
and at any time(except between young gc and some FullGC(not cms)), at least one of the two
survrior is empty, contains no objects. so if we get max heapsize by jvm, jvm will just return
Xmx - one survrior size. 

  public static synchronized BlockCache instantiateBlockCache(Configuration conf) {
    if (blockCacheDisabled) return null;
    MemoryUsage mu = ManagementFactory.getMemoryMXBean().getHeapMemoryUsage();
    LruBlockCache l1 = getL1(conf, mu);


  static long getLruCacheSize(final Configuration conf, final MemoryUsage mu) {
    float cachePercentage = conf.getFloat(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY,
    if (cachePercentage <= 0.0001f) {
      blockCacheDisabled = true;
      return -1;
    if (cachePercentage > 1.0) {
      throw new IllegalArgumentException(HConstants.HFILE_BLOCK_CACHE_SIZE_KEY +
        " must be between 0.0 and 1.0, and not > 1.0");

    // Calculate the amount of heap to give the heap.
    return (long) (mu.getMax() * cachePercentage);

the code above is how hbase compute block cache size, and the keypoint is how mu.getMax()
is calculated。
mu itself is returned by the following jni call:
JNIEXPORT jobject JNICALL Java_sun_management_MemoryImpl_getMemoryManagers0
  (JNIEnv *env, jclass dummy) {
    return jmm_interface->GetMemoryManagers(env, NULL);
GetMemoryManagers(env, NULL) is implemented in jvm in file:
and part of this function implementation is listed bellow:

// Returns a java/lang/management/MemoryUsage object representing
// the memory usage for the heap or non-heap memory.
JVM_ENTRY(jobject, jmm_GetMemoryUsage(JNIEnv* env, jboolean heap))
  ResourceMark rm(THREAD);

  // Calculate the memory usage
  size_t total_init = 0;
  size_t total_used = 0;
  size_t total_committed = 0;
  size_t total_max = 0;
  bool   has_undefined_init_size = false;
  bool   has_undefined_max_size = false;


  MemoryUsage usage((heap ? InitialHeapSize : total_init),
                    (heap ? Universe::heap()->max_capacity() : total_max));

  Handle obj = MemoryService::create_MemoryUsage_obj(usage, CHECK_NULL);
  return JNIHandles::make_local(env, obj());

according to ctor of MemoryUsage, the _maxSize field is initialized by Universe::heap()->max_capacity(),
which also implemented in jvm, take CMS gc for example(PS and G1 is almost the same):
size_t GenCollectedHeap::max_capacity() const {
  size_t res = 0;
  for (int i = 0; i < _n_gens; i++) {
    res += _gens[i]->max_capacity();
  return res;

in the above code, _n_gens is 2, represent 2 generations(young and old), and max_capacity()
is a virtual call , for young generation and cms gc, the max_capacity() is implemented in
size_t DefNewGeneration::max_capacity() const {
  const size_t alignment = GenCollectedHeap::heap()->collector_policy()->min_alignment();
  const size_t reserved_bytes = reserved().byte_size();
  return reserved_bytes - compute_survivor_size(reserved_bytes, alignment);

reserved_bytes is just Xmn we set, so here we can see jvm calculate young gen max_capacity
by Xmn-one survrior size.
actually, in CMS gc ,adaptive policy is disabled explicitly in jvm, so the two survrior alway
of the same this.

> BlockCache size should not exceed acceptableSize too many
> ---------------------------------------------------------
>                 Key: HBASE-16287
>                 URL: https://issues.apache.org/jira/browse/HBASE-16287
>             Project: HBase
>          Issue Type: Improvement
>          Components: BlockCache
>            Reporter: Yu Sun
> Our regionserver has a configuation as bellow:
>   -Xmn4g -Xms32g -Xmx32g -XX:SurvriorRatio=2 -XX:+UseConcMarkSweepGC 
> also we only use blockcache,and set hfile.block.cache.size = 0.3 in hbase_site.xml,so
under this configuration, the lru block cache size will be(32g-1g)*0.3=9.3g. but in some scenarios,some
of the rs will occur continuous FullGC  for hours and most importantly, after FullGC most
of the object in old will not be GCed. so we dump the heap and analyse with MAT and we observed
a obvious memory leak in LruBlockCache, which occpy about 16g memory, then we set set class
LruBlockCache log level to TRACE and observed this in log:
> {quote}
> 2016-07-22 12:17:58,158 INFO  [LruBlockCacheStatsExecutor] hfile.LruBlockCache: totalSize=15.29
GB, freeSize=-5.99 GB, max=9.30 GB, blockCount=628182, accesses=101799469125, hits=93517800259,
hitRatio=91.86%, , cachingAccesses=99462650031, cachingHits=93468334621, cachingHitsRatio=93.97%,
evictions=238199, evicted=4776350518, evictedPerRun=20051.93359375{quote}
> we can see blockcache size has exceeded acceptableSize too many, which will cause the
FullGC more seriously. 
> Afterfter some investigations, I found in this function:
> {code:borderStyle=solid}
>   public void cacheBlock(BlockCacheKey cacheKey, Cacheable buf, boolean inMemory,
>       final boolean cacheDataInL1) {
> {code}
> No matter the blockcache size has been used, just put the block into it. but if the evict
thread is not fast enough, blockcache size will increament significantly.
> So here I think we should have a check, for example, if the blockcache size > 1.2
* acceptableSize(), just return and dont put into it until the blockcache size if under watrmark.
if this is reasonable, I can make a small patch for this.

This message was sent by Atlassian JIRA

View raw message