hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13408) HBase In-Memory Memstore Compaction
Date Fri, 21 Oct 2016 18:48:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595991#comment-15595991
] 

stack commented on HBASE-13408:
-------------------------------

[~ram_krish], [~anoopsamjohn], [~eshcar], [~ebortnik] and myself met last Weds morning, the
19th of October to chat about where we are all at on inmemory compaction. Here are some rough
notes:

{code}
1. Reiteration that inmemory compaction needs to be on all the time with no associated perf
degradation and with minimal config required to get benefit.
2. Need more testing and with more variety (Zipfian so inmemory compaction gets a chance to
shine). We'll all pitch in here.
3. When to merge (Eshcar "It is just a question of when.."). Back and forth. Concern that
current default of merge on every flush is a default that will make inmemory compaction look
bad because it for sure generates loads of garbage. Suggestion to go to other extreme where
we merge only at flush-to-disk letting the pipeline build up in memory.
4. We'll pick up CellChunkMap at the next meeting but meantime wukk revive a rumored existing
umbrella issue. CCM is where we'll get biggest bang for the buck so excited to get this done.
Need to solve the cell-too-big-issue still. Maybe split its dev some between Y! and Intel/St.Ack...
TBD.
5. Ram and Anoop fixed some GC issues in "HBASE-16608 Introducing the ability to merge ImmutableSegments
without copy-compaction or SQM usage" and will put up a new version of patch w/ fixes. Generally
agreed patch is close to commit.

The Y! crew are on holiday until next Tuesday.

Did the note to the dev list go up on state of inmemory compaction (after Tuesday -- smile).
...

The back-and-forth about when/how to merge will probably continue into HBase-16417; we have
to keep looking for the sweet spot. 
{code}

> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>         Attachments: HBASE-13408-trunk-v01.patch, HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch,
HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch,
HBASE-13408-trunk-v08.patch, HBASE-13408-trunk-v09.patch, HBASE-13408-trunk-v10.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf,
HBaseIn-MemoryMemstoreCompactionDesignDocument-ver03.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver04.pdf,
HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf,
InMemoryMemstoreCompactionMasterEvaluationResults.pdf, InMemoryMemstoreCompactionScansEvaluationResults.pdf,
StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its in-memory component.
The memstore absorbs all updates to the store; from time to time these updates are flushed
to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted
until it is written to the filesystem and optionally to block-cache. This may result in underutilization
of the memory due to duplicate entries per row, for example, when hot data is continuously
updated. 
> Generally, the faster the data is accumulated in memory, more flushes are triggered,
the data sinks to disk more frequently, slowing down retrieval of data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data in memory,
and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.	The data is kept in memory for as long as possible
> 2.	Memstore data is either compacted or in process of being compacted 
> 3.	Allow a panic mode, which may interrupt an in-progress compaction and force a flush
of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message