hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1192) LRU-style map for the block cache
Date Mon, 09 Feb 2009 18:48:59 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671957#action_12671957
] 

Jonathan Gray commented on HBASE-1192:
--------------------------------------

My proposal is to build upon the work being done in HBASE-1186 and HBASE-1188 to create our
own LRU-style Map specialized for the block cache.

A few points as to why I think we should move away from SoftReferences and manage everything
ourselves:

- The defined loose constraints and observed non-uniform behavior of SoftReferences
- We're already "managing" heap usage for Memcache.  Using softrefs for block cache, we'll
have something that's almost a black box and trying to use all available memory.  This could
make the memcache flush out itself because the RS is under heap pressure.  We won't have much
control over fairness between memcaches, indexes, and the block cache if using softrefs. 
I propose we build something very similar to the MemcacheFlusher thread that would deal with
fairness between the different elements of the RS that uses significant heap (memcaches, indexes,
block cache, cell cache, in-memory families, blooms, etc...).  As with the new file format,
there's going to be more parameters in hbase 0.20 in order to optimize for your use case.
 Like the file format, we'll have to come up with reasonable defaults and write more documentation
about the effects of the different settings.  Do we want to divide up the total available
heap on startup between the different memory consumers, do we want to leave it wide open for
memcaches/indexes/blocks until we're under heap pressure and then make a decision about how
to flush or evict fairly?
- Ability to implement in-memory families as described in the bigtable paper very easily by
adding priority into the eviction algorithm
- Full table scans can thrash the cache (for Streamy, we do this only for MR jobs not user-facing
stuff).  With our own structure, we can use a modified LRU algorithm that is resistant to
table scans (i'm a fan of ARC but there's license issues; it's fairly simple to implement
this if you manually configure... ARC is cool because it self-tunes).

Those are my main points.  The primary reason to not go in this direction is simplicity. 
However, I think what we've learned in the past couple releases from OOME hell, we must (and
already are) be in the business of heap management.  Streamy guys have done the research and
development to do memory management in java as best as it seems it can be done (based on other
open source java caching apps), so I'm confident we can be correct, efficient, and accurate
enough to prevent oome issues and get optimal performance.

Erik will post his findings from his work experimenting with softref behavior.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache.  The primary decision
is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message