hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2621) Memcache flush flushing every 60 secs with out considering the max memcache size
Date Wed, 16 Jan 2008 20:53:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559658#action_12559658

Billy Pearson commented on HADOOP-2621:

With out a way to set a limit on memory usage and a limit where we start blocking updates
the region server could reach a point where we run out of memory before it could flush the
memcache causing a crash on the region server.  I thank we should move back to using hbase.hregion.memcache.block.multiplier
and hbase.hregion.memcache.flush.size. Maybe we can have an option flusher the splitter can
call to force a flush on a region so the split can do its work.

> Memcache flush flushing every 60 secs with out considering the max memcache size
> --------------------------------------------------------------------------------
>                 Key: HADOOP-2621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2621
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>             Fix For: 0.16.0
> looks like hbase is flushing all memcache to disk every 60 secs causing a lot of work
for the compactor to keep up because column gets its own mapfile and every region is flushed
at one time. This could be a vary large number of mapfiles to write if a region server is
hosting 100 regions all with milti columns.
> Idea memcache flush
> keep all data in memory until memcache get larger then the conf size with hbase.hregion.memcache.flush.size.
> When we reach this size we should flush the regions that are the largest first stopping
once we drop back below the memcache max size maybe 20% below the max. This will to flush
only as needed as each flush takes time to compact when compaction runs on a region. while
we are flushing a region we should also be blocking new updates from happening on that region
so the region server does not get over ran when a high update load hits a region server. By
only blocking on the region we are flushing at that time other regions will still be able
to do updates this.
> We we still want to use the hbase.regionserver.optionalcacheflushinterval we should set
to to run once an hour so something like that so we can recover memory from the memcache on
region that do not have a lot updates in memory. But running at the default set now of 60
secs is not so good for the compactor if it has many regions to handle also not good for a
scanner to have to scan many small files vs a few larger ones
> Example a compactor may take 15 mins to compact a region in that time we will flush 15
times causeing all other regions to get a new mapfile to compact when it becomes it turn to
get compacted if you had many regions getting compacted the last one on the list of say 10
regions would have 10 regions * 15 mins each = 150 mapfiles for each column in the last region
written before the compactor can get to it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message