hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward Bortnikov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
Date Sun, 30 Aug 2015 13:01:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14721512#comment-14721512
] 

Edward Bortnikov commented on HBASE-13408:
------------------------------------------

We did our homework to address [~anoop.hbase]'s and [~mbertozzi]'s concerns. In-memory HFile's
are absolutely possible to implement. The challenge with the current implementation is that
the StoreFile implementation is too tightly coupled with HDFS whereas it could be associated
with byte stream just as well. Most of the code is FS-independent, however some accurate refactoring
and a couple of new abstractions would be required. Our concern is that this code is already
a bulk, and further expanding it only increases the risk. Would it make sense do the following:
(1) start reviewing and checking-in the existing code, either in one bulk or piecemeal, and
(2) in parallel, design, implement and evaluate in-memory HFile's, either in a separate jira
or as subtask in this jira. This could be just one more patch in the series.  


> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: HBASE-13408-trunk-v01.patch, HBASE-13408-trunk-v02.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf,
HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf,
InMemoryMemstoreCompactionScansEvaluationResults.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its in-memory component.
The memstore absorbs all updates to the store; from time to time these updates are flushed
to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted
until it is written to the filesystem and optionally to block-cache. This may result in underutilization
of the memory due to duplicate entries per row, for example, when hot data is continuously
updated. 
> Generally, the faster the data is accumulated in memory, more flushes are triggered,
the data sinks to disk more frequently, slowing down retrieval of data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data in memory,
and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.	The data is kept in memory for as long as possible
> 2.	Memstore data is either compacted or in process of being compacted 
> 3.	Allow a panic mode, which may interrupt an in-progress compaction and force a flush
of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message