hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-70) [hbase] memory management
Date Tue, 05 Feb 2008 23:13:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-70?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565944#action_12565944
] 

Jim Kellerman commented on HBASE-70:
------------------------------------

Stack and I discussed the possibility of moving the memcache back to the region level instead
of the store level, because it would make accounting easier. However this approach has some
serious drawbacks:
- some of the methods that access both the memcache and the store files (such as getKeys,
etc), are more efficient when everything is at the store level.
- we experienced a great deal of pain moving the memcache to the store level from the region
level in the first place as it forced us to re-write a lot of the scanner code.
- the reason for moving memcache from region to store level was because it was greatly simplified
and reduced contention. Before when the cache filled, we had no idea how much of it belonged
to which family

So what I would suggest, instead of moving memcache back up to the region level, is to move
the cache size management back up to the region level. Let the region keep track of the total
cache space in use, which store(s) have the largest caches, etc. Contention is reduced to
smaller data structures that manage the accounting, instead of bigger structures like the
caches themselves.

This way we achieve:
- Better control of the overall cache space in use
- Eliminate the need for radical modifications (moving the cache back to the region level
at this point would be much harder than when when it was moved to the store level in the first
place since so much more has been added)

Basically we are able to gain what we need (better memory management), with less contention
on the caches themselves, via a less risky (and radical) change.


> [hbase] memory management
> -------------------------
>
>                 Key: HBASE-70
>                 URL: https://issues.apache.org/jira/browse/HBASE-70
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: stack
>
> Each Store has a Memcache of edits that is flushed on a fixed period (It used to be flushed
when it grew beyond a limit). A Region can be made up of N Stores.  A regionserver has no
upper bound on the number of regions that can be deployed to it currently.  Add to this that
per mapfile, we have read the index into memory.  We're also talking about adding caching
of blocks and cells.
> We need a means of keeping an account of memory usage adjusting cache sizes and flush
rates (or sizes) dynamically -- using References where possible -- to accomodate deployment
of added regions.  If memory is strained, we should reject regions proffered by the master
with a resouce-constrained, or some such, message.
> The manual sizing we currently do ain't going to cut it for clusters of any decent size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message