hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
Date Wed, 24 Feb 2016 17:58:19 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15163424#comment-15163424

stack commented on HBASE-14918:

bq. ....the CellBlocks suggests to use ArrayList of PositionedByteRange as the underlying
data structure.

You thinking this still the case [~anoop.hbase] given where offheaping of read path is going?
Should base type be ByteBuff so can do onheap/offheap?

bq, Thus we suggest CellBlocksSegment, which fits into new Segments structure of MemStore
and inherits from ImmutableSegment.

High-level, sounds good.

bq. Underneath, CellBlocksSegment has the same idea of CellBlock. 

One question; what happens when a CellBlockSegment runs into a HFileBlock? How will the marshalling
from CBS to HFB run?

bq. Just striving to use an array of arrays, instead of list of arrays, in order to enjoy
the binary search and less memory overhead.

A noble goal.

So, an array of CellBlocks? You'd allocate CellBlocks with MSLAB?

bq. As far as for now MSLAB doesn't support off-heap allocation, the PositionedByteRange can
be replaced by ByteRange/Chunk currently returned by MSLAB. Also little more tuning is required.

Ok. Sorry for the plethora of types. We seem to be settling on a few now we know more.

There also means of allocation. MSLAB, BucketCache allocator.

We can move BBBP no problem.

Yeah, lets align what you are doing here with the offheaping of the write path work @anastasia.

bq. Sorry for this long monolog 

Keep going. It is good stuff.

> In-Memory MemStore Flush and Compaction
> ---------------------------------------
>                 Key: HBASE-14918
>                 URL: https://issues.apache.org/jira/browse/HBASE-14918
>             Project: HBase
>          Issue Type: Umbrella
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 0.98.18
>         Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch
> A memstore serves as the in-memory component of a store unit, absorbing all updates to
the store. From time to time these updates are flushed to a file on disk, where they are compacted
(by eliminating redundancies) and compressed (i.e., written in a compressed format to reduce
their storage size).
> We aim to speed up data access, and therefore suggest to apply in-memory memstore flush.
That is to flush the active in-memory segment into an intermediate buffer where it can be
accessed by the application. Data in the buffer is subject to compaction and can be stored
in any format that allows it to take up smaller space in RAM. The less space the buffer consumes
the longer it can reside in memory before data is flushed to disk, resulting in better performance.
> Specifically, the optimization is beneficial for workloads with medium-to-high key churn
which incur many redundant cells, like persistent messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing segment (StoreSegment)
as first-class citizen, and decoupling memstore scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores update region
counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized immutable
segment representation, and 
> (4) Memory optimization including compressed format representation and off heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in HBASE-13408.

This message was sent by Atlassian JIRA

View raw message