hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Duxbury (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-512) Add configuration for global aggregate memcache size
Date Mon, 07 Apr 2008 21:15:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12586544#action_12586544
] 

Bryan Duxbury commented on HBASE-512:
-------------------------------------

The whole point of this issue is to tie up a server thread - in fact all of them - while we
wait for memory usage to go down. The objective is to make it so that in situations like the
sequential performance evaluation, region servers will not take on so many puts that they'll
explode. 

If flushImmediately was called on a region that was already being flushed, then the flush
lock would be locked, and flushImmediately would block until it acquired the lock. It would
then find that the region didn't need to be flushed (assuming HRegion.flushcache() has the
smarts to do that). 

Adding a "high priority queue" would make some flushes happen faster, but not immediately,
and not in a fashion that blocks the region server from taking on more puts. 

> Add configuration for global aggregate memcache size
> ----------------------------------------------------
>
>                 Key: HBASE-512
>                 URL: https://issues.apache.org/jira/browse/HBASE-512
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>          Components: regionserver
>            Reporter: Bryan Duxbury
>            Assignee: Bryan Duxbury
>             Fix For: 0.2.0
>
>         Attachments: 512-v2.patch, 512.patch
>
>
> Currently, we have a configuration parameter for the size a Memcache must reach before
it is flushed. This leads to pretty even sized mapfiles when flushes run, which is nice. However,
as noted in the parent issue, we can often get to a point where we run out of memory because
too much data is hanging around in Memcaches.
> I think that we should add a new configuration parameter that governs the total amount
of memory that the region server should spend on Memcaches. This would have to be some number
less than the heap size - we'll have to discover the proper values through experimentation.
Then, when a put comes in, if the global aggregate size of all the Memcaches for all the stores
is at the threshold, then we should block the current and any subsequent put operations from
completing until forced flushes cause the memory usage to go back down to a safe level. The
existing strategy for triggering flushes will still be in play, just augmented with this blocking
behavior.
> This approach has the advantage of helping us avoid OOME situations by warning us well
in advance of overflow. Additionally, it becomes something of a performance tuning knob, allowing
you to allocate more memory to improve write performance. This is superior to the previously
suggested PhantomReference approach because that would possibly causes us to bump into further
OOMEs while we're trying to flush to avoid them. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message