Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 76934 invoked from network); 15 Dec 2010 21:10:33 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Dec 2010 21:10:33 -0000 Received: (qmail 41242 invoked by uid 500); 15 Dec 2010 21:10:33 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 41183 invoked by uid 500); 15 Dec 2010 21:10:33 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 41175 invoked by uid 99); 15 Dec 2010 21:10:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Dec 2010 21:10:33 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Dec 2010 21:10:33 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oBFLACA2021543 for ; Wed, 15 Dec 2010 21:10:12 GMT Message-ID: <31587741.145311292447412467.JavaMail.jira@thor> Date: Wed, 15 Dec 2010 16:10:12 -0500 (EST) From: "Jonathan Gray (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-3327) For increment workloads, retain memstores in memory after flushing them In-Reply-To: <21493110.42741291921803654.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971837#action_12971837 ] Jonathan Gray commented on HBASE-3327: -------------------------------------- I actually disagree that the biggest benefit is 1 memstore plus snapshot. That would then cover flushes but not compactions. As stated, flushing w/ cacheOnWrite would be virtually the same but consume 25% the memory. So for this case, I don't see the clear benefit of retaining the snapshot vs. cacheOnWrite of the flushed file. This change is significant and would require a good bit of modifications to the tracking of aggregate MemStore sizes and the rules around eviction when under global heap pressure. I still do like this idea in general but not sure it's the best direction for effort to be spent right now. > For increment workloads, retain memstores in memory after flushing them > ----------------------------------------------------------------------- > > Key: HBASE-3327 > URL: https://issues.apache.org/jira/browse/HBASE-3327 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: Karthik Ranganathan > > This is an improvement based on our observation of what happens in an increment workload. The working set is typically small and is contained in the memstores. > 1. The reason the memstores get flushed is because the number of wal logs limit gets hit. > 2. This in turn triggers compactions, which evicts the block cache. > 3. Flushing of memstore and eviction of the block cache causes disk reads for increments coming in after this because the data is no longer in memory. > We could solve this elegantly by retaining the memstores AFTER they are flushed into files. This would mean we can quickly populate the new memstore with the working set of data from memory itself without having to hit disk. We can throttle the number of such memstores we retain, or the memory allocated to it. In fact, allocating a percentage of the block cache to this would give us a huge boost. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.