Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 93952D900 for ; Thu, 18 Oct 2012 00:48:04 +0000 (UTC) Received: (qmail 98775 invoked by uid 500); 18 Oct 2012 00:48:03 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 98637 invoked by uid 500); 18 Oct 2012 00:48:03 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 98535 invoked by uid 99); 18 Oct 2012 00:48:03 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 00:48:03 +0000 Date: Thu, 18 Oct 2012 00:48:03 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: issues@hbase.apache.org Message-ID: <1936016149.61288.1350521283264.JavaMail.jiratomcat@arcas> In-Reply-To: <4027823.34685.1342004614722.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (HBASE-6371) [89-fb] Tier based compaction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478537#comment-13478537 ] Lars Hofhansl commented on HBASE-6371: -------------------------------------- Is this in the 0.89 branch, and is it finished? Does this only change the compaction selection? A feature I'd be interested in is separating old and new data, such it would not typically be compacted into the same file(s) - even in a major compaction. If there is hot new data and lot of older data that is only occasionally queried, if old and new data ends up in the same set of files, scanning is slow, because scanning the latest version only is hitting many more blocks then necessary and old versions of KVs need to be skipped (and HFiles with only old data could otherwise simply be ignored). An approach LevelDB might be interesting. Here's a description: http://code.google.com/p/leveldb/source/browse/doc/impl.html Is that what we are trying to do with this? > [89-fb] Tier based compaction > ----------------------------- > > Key: HBASE-6371 > URL: https://issues.apache.org/jira/browse/HBASE-6371 > Project: HBase > Issue Type: Improvement > Reporter: Akashnil > Assignee: Liyin Tang > Labels: noob > > Currently, the compaction selection is not very flexible and is not sensitive to the hotness of the data. Very old data is likely to be accessed less, and very recent data is likely to be in the block cache. Both of these considerations make it inefficient to compact these files as aggressively as other files. In some use-cases, the access-pattern is particularly obvious even though there is no way to control the compaction algorithm in those cases. > In the new compaction selection algorithm, we plan to divide the candidate files into different levels according to oldness of the data that is present in those files. For each level, parameters like compaction ratio, minimum number of store-files in each compaction may be different. Number of levels, time-ranges, and parameters for each level will be configurable online on a per-column family basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira