Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C44E818F11 for ; Thu, 24 Sep 2015 23:29:04 +0000 (UTC) Received: (qmail 23196 invoked by uid 500); 24 Sep 2015 23:29:04 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 23142 invoked by uid 500); 24 Sep 2015 23:29:04 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 23129 invoked by uid 99); 24 Sep 2015 23:29:04 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Sep 2015 23:29:04 +0000 Date: Thu, 24 Sep 2015 23:29:04 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14907245#comment-14907245 ] Ted Yu commented on HBASE-13408: -------------------------------- bq. getCellSet() which can be very inefficient. Under which scenario(s) would this method be inefficient ? bq. The compaction pipeline may generate different type of mutable segments. I thought the compaction pipeline would generate immutable segments ? bq. HStoreFileScanner can be refactored to implement StoreSegmentScanner Would the above be done in a future patch or different JIRA ? > HBase In-Memory Memstore Compaction > ----------------------------------- > > Key: HBASE-13408 > URL: https://issues.apache.org/jira/browse/HBASE-13408 > Project: HBase > Issue Type: New Feature > Reporter: Eshcar Hillel > Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-13408-trunk-v01.patch, HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, HBASE-13408-trunk-v04.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf, InMemoryMemstoreCompactionScansEvaluationResults.pdf, StoreSegmentandStoreSegmentScannerClassHierarchies.pdf > > > A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. > Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent. > In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. > We suggest a new compacted memstore with the following principles: > 1. The data is kept in memory for as long as possible > 2. Memstore data is either compacted or in process of being compacted > 3. Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore. > We suggest applying this optimization only to in-memory column families. > A design document is attached. > This feature was previously discussed in HBASE-5311. -- This message was sent by Atlassian JIRA (v6.3.4#6332)