hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anastasia Braginsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14918) In-Memory MemStore Flush and Compaction
Date Tue, 23 Feb 2016 14:47:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158983#comment-15158983

Anastasia Braginsky commented on HBASE-14918:

Thank you for your immediate attention [~stack]!

Of course, we looked on CellBlock from HBASE-10713
The code there is very well written with comments and thus possible to understand from just
reading the patch. Kudos [~anoop.hbase] :) !
(At least I hope that I understand it :) and [~anoop.hbase] please correct me if I am wrong.)

Alongside with some restructuring and refactoring (partially issued by HBASE-14919), the CellBlocks
suggests to use ArrayList of PositionedByteRange as the underlying data structure.
PositionedByteRange and SimplePositionedByteRange are allocated simply from JVM heap.
The code treats many details and also provides a very important CellBlockScanner to scan the
new data structure.
In light of the recent MemStore refactoring, the CellBlock patch clearly can not be used as
However, the most important and deep parts of the code are very valuable and definitely can
be reused.

Thus we suggest CellBlocksSegment, which fits into new Segments structure of MemStore and
inherits from ImmutableSegment.
Underneath, CellBlocksSegment has the same idea of CellBlock. 
Just striving to use an array of arrays, instead of list of arrays, in order to enjoy the
binary search and less memory overhead.
Taking in consideration the earlier [~anoop.hbase]'s comments about MSLAB (and a simple common
sense) we suggest to use MSLAB for allocating any sequence of bytes.
Please note that MSLAB is very suitable also because it issues the reference counting for
chunk scans and thus the deallocation of the chunks per segment.
As far as for now MSLAB doesn't support off-heap allocation, the PositionedByteRange can be
replaced by ByteRange/Chunk currently returned by MSLAB. Also little more tuning is required.

As completely orthogonal, but related issue we also see a possibility of enhancing the MSLAB
and adding it an ability to allocate its Chunks on- and off-heap.
It is probably issue for sub-task number 5 of HBASE-14918 :)
Obviously, this requires some redesign of MemStoreLAB, HeapMemStoreLab.Chunk, and some other
classes around the memory allocation.
In particular, the implementation of HeapMemStoreLab.Chunk with "byte[] field" and the usage
of ByteRange, can be replaced with (for example) ByteBuffer.
(ByteBufferArray from hbase-common/org.apache.hadoop.hbase.util also looks very interesting
I agree that it is better to pre-allocate the off-heap Chunks, for that we can probably enhance
the MemStoreChunkPool.
I took a look on the BoundedByteBufferPool, which I found only in hbase-client code. It also
looks very suitable, however in different component.

Sorry for this long monolog :)
[~anoop.hbase], [~stack], everybody, what do you think?
I am thrilled to hear your insightful comments! :))))))

> In-Memory MemStore Flush and Compaction
> ---------------------------------------
>                 Key: HBASE-14918
>                 URL: https://issues.apache.org/jira/browse/HBASE-14918
>             Project: HBase
>          Issue Type: Umbrella
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 0.98.18
>         Attachments: CellBlocksSegmentDesign.pdf, MSLABMove.patch
> A memstore serves as the in-memory component of a store unit, absorbing all updates to
the store. From time to time these updates are flushed to a file on disk, where they are compacted
(by eliminating redundancies) and compressed (i.e., written in a compressed format to reduce
their storage size).
> We aim to speed up data access, and therefore suggest to apply in-memory memstore flush.
That is to flush the active in-memory segment into an intermediate buffer where it can be
accessed by the application. Data in the buffer is subject to compaction and can be stored
in any format that allows it to take up smaller space in RAM. The less space the buffer consumes
the longer it can reside in memory before data is flushed to disk, resulting in better performance.
> Specifically, the optimization is beneficial for workloads with medium-to-high key churn
which incur many redundant cells, like persistent messaging. 
> We suggest to structure the solution as 4 subtasks (respectively, patches). 
> (1) Infrastructure - refactoring of the MemStore hierarchy, introducing segment (StoreSegment)
as first-class citizen, and decoupling memstore scanner from the memstore implementation;
> (2) Adding StoreServices facility at the region level to allow memstores update region
counters and access region level synchronization mechanism;
> (3) Implementation of a new memstore (CompactingMemstore) with non-optimized immutable
segment representation, and 
> (4) Memory optimization including compressed format representation and off heap allocations.
> This Jira continues the discussion in HBASE-13408.
> Design documents, evaluation results and previous patches can be found in HBASE-13408.

This message was sent by Atlassian JIRA

View raw message