hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2621) Memcache flush flushing every 60 secs with out considering the max memcache size
Date Wed, 16 Jan 2008 15:24:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559530#action_12559530

stack commented on HADOOP-2621:

Looks like hbase.hregion.memcache.flush.size is no longer being read/used.  This probably
means that the blocking mechanism -- when memcache > twice the limit, we stop taking on
updates -- no longer works.  Was useful for when regionservers were on occasion overwhelmed;
it gave them a chance to catch their breath.

>From Billy's description up on IRC, compaction was being overrun by the number of files
produced flushing when cluster was under load.  In the past there was an attempt at striking
an equilibrium between flush and compaction rates/sizes.

I'll take a look at this.

> Memcache flush flushing every 60 secs with out considering the max memcache size
> --------------------------------------------------------------------------------
>                 Key: HADOOP-2621
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2621
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>             Fix For: 0.16.0
> looks like hbase is flushing all memcache to disk every 60 secs causing a lot of work
for the compactor to keep up because column gets its own mapfile and every region is flushed
at one time. This could be a vary large number of mapfiles to write if a region server is
hosting 100 regions all with milti columns.
> Idea memcache flush
> keep all data in memory until memcache get larger then the conf size with hbase.hregion.memcache.flush.size.
> When we reach this size we should flush the regions that are the largest first stopping
once we drop back below the memcache max size maybe 20% below the max. This will to flush
only as needed as each flush takes time to compact when compaction runs on a region. while
we are flushing a region we should also be blocking new updates from happening on that region
so the region server does not get over ran when a high update load hits a region server. By
only blocking on the region we are flushing at that time other regions will still be able
to do updates this.
> We we still want to use the hbase.regionserver.optionalcacheflushinterval we should set
to to run once an hour so something like that so we can recover memory from the memcache on
region that do not have a lot updates in memory. But running at the default set now of 60
secs is not so good for the compactor if it has many regions to handle also not good for a
scanner to have to scan many small files vs a few larger ones
> Example a compactor may take 15 mins to compact a region in that time we will flush 15
times causeing all other regions to get a new mapfile to compact when it becomes it turn to
get compacted if you had many regions getting compacted the last one on the list of say 10
regions would have 10 regions * 15 mins each = 150 mapfiles for each column in the last region
written before the compactor can get to it.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message