Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4D4AD90A8 for ; Fri, 10 Feb 2012 19:05:25 +0000 (UTC) Received: (qmail 71167 invoked by uid 500); 10 Feb 2012 19:05:24 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 70439 invoked by uid 500); 10 Feb 2012 19:05:23 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 70318 invoked by uid 99); 10 Feb 2012 19:05:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Feb 2012 19:05:23 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Feb 2012 19:05:20 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id ADCE71AF0C1 for ; Fri, 10 Feb 2012 19:04:59 +0000 (UTC) Date: Fri, 10 Feb 2012 19:04:59 +0000 (UTC) From: "Kannan Muthukkaruppan (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <106695969.25339.1328900699713.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <2059322234.68806.1327358140886.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5263) Preserving cached data on compactions through cache-on-write MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-5263?page=3Dcom.atlassian= .jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1320= 5645#comment-13205645 ]=20 Kannan Muthukkaruppan commented on HBASE-5263: ---------------------------------------------- Promising idea!=20 In terms of the implementation details, it would be nice to avoid some path= ological cases... were cold data (which was in the cache but almost on its = way out of the cache) becomes hot again. I am guessing a naive approach cou= ld have this pitfall, but something that additionally takes into considerat= ion the hotness of the keys in the block and appropriately places the data = in the correct place in the blockcache LRU would not. Haven't thought throu= gh much about the implementation details... but wanted to throw out the ini= tial thoughts at least. See also related idea by Liyin here: HBASE-5263. These could be complementa= ry approaches. =20 > Preserving cached data on compactions through cache-on-write > ------------------------------------------------------------ > > Key: HBASE-5263 > URL: https://issues.apache.org/jira/browse/HBASE-5263 > Project: HBase > Issue Type: Improvement > Reporter: Mikhail Bautin > Assignee: Mikhail Bautin > Priority: Minor > > We are tackling HBASE-3976 and HBASE-5230 to make sure we don't trash the= block cache on compactions if cache-on-write is enabled. However, it would= be ideal to reduce the effect compactions have on the cached data. For eve= ry block we are writing for a compacted file we can decide whether it needs= to be cached based on whether the original blocks containing the same data= were already in cache. More precisely, for every HFile reader in a compact= ion we can maintain a boolean flag saying whether the current key-value cam= e from a disk IO or the block cache. In the HFile writer for the compaction= 's output we can maintain a flag that is set if any of the key-values in th= e block being written came from a cached block, use that flag at the end of= a block to decide whether to cache-on-write the block, and reset the flag = to false on a block boundary. If such an inclusive approach would still tra= sh the cache, we could restrict the total number of blocks to be cached per= an output HFile, switch to an "and" logic instead of "or" logic for decidi= ng whether to cache an output file block, or only cache a certain percentag= e of output file blocks that contain some of the previously cached data.=20 > Thanks to Nicolas for this elegant online algorithm idea! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira