hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anastasia Braginsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14921) Memory optimizations
Date Fri, 01 Apr 2016 12:25:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221626#comment-15221626

Anastasia Braginsky commented on HBASE-14921:

bq. But one qusetion why is that the init() method position changed now? When a chunk was
got from the pool the init was previously happening after the CAS operation but now it is
now moved into the allocateChunk itself? Will it have ramifications?

We wanted to concentrate all chunk creations (allocation and initialization) in the ChunkPool,
in order to let the ChunkPool manage the mapping to the ID. Previously the Chunk was initiated
only just before it is going to be directly used. I see your point [~ram_krish], as it is
currently implemented Chunks are going to be initialized (memory allocated) also when just
pre-created for the pool. This is not efficient. I’ll fix that.

bq. I read the new classes in this patch. So in which patch this is being used ? Or it will
come later?

Thank you [~anoop.hbase] for taking a look. The CellFlatMap (CellBlock name changed with [~stack]
help) is going to be part of ImmutableSegment, so after in-memory-flush, the CSLM should be
changed to CellFlatMap. I am currently writing this code and hope to present it soon. Some
intuition for the usage can be found in the TestCellBlockSet.

bq. We don't need the last int of Cell length. We have the offset to Cell. See constructor
- KeyValue(final byte [] bytes, final int offset)

This is a very good comment! I didn’t think in that direction, but we can enjoy this “super-compact-representation”

bq. If we use the Cell[] way, per Cell we have more overhead.

Of course, Cell[] is expensive. It was implemented because it is very simple, easy to debug
and to compare with plain byte array serialization. But the Cell[] can be useful for very
large cells, those bigger then MSLAB Chunks (e.g. > 2MB). If we know we are going to deal
with such very large cells and do not want to allocate un-reusable special-size MSLAB Chunks,
CellArrayMap is good solution (also new name for CellBlockObjectArray).

bq. BTW HBASE-15179, under this we are doing some PoC and test with off heap Memstore.

Please pay attention that as part of this jira we change MSLAB and MemStoreChunkPool files.
Need to align with your code taking MSLAB off-heap.

bq. The 3 ints per cell also written to chunks we get from same MSLAB. We need this really?
So if we change to 8 bytes per cell, and when chunk size is 2 MB, we can have 262144 cells.
We will have this many really? If not, we may waste that chunk?

Excellent discussion, [~anoop.hbase]! Those were my thoughts as well… Initially, I wrote
CellBlockSerialized (now called CellChunkMap) as getting byte[] of any size and dealing with
it. However, later I thought that this might be needed to be taken off-heap and maybe it is
better to centralize all this off-heaping to the Chunks. So if Chunk is off-heap then all
what is implemented on top of it is off-heap as well…

Now, if we may have just 2 int for Cell representation (2^3 bytes), we may fit 2^21/2^3=2^18
cells in a Chunk of size 2MB.
A cell may use 256=2^8 bytes for all its data, which is not too much. Do we often serve Cells
with size smaller than that?
If so, then one Chunk can represent 2^18*2^8=2^26 bytes = 64MB, which is already half of what
we can hold in one MemStore without flushing to disk. 
>From here, in 99% we will not use Chunk[] and single Chunk is enough.

But what if not? What if we have really small cells, like integer for a key and integer for
a data? Is it a possible use-case? For such small cells the representation of metadata is
actually super-important, as you do not want metadata to be bigger than data…

I will continue answering more questions already posted here...

> Memory optimizations
> --------------------
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, CellBlocksSegmentinthecontextofMemStore(1).pdf,
HBASE-14921-V01.patch, HBASE-14921-V02.patch
> Memory optimizations including compressed format representation and offheap allocations

This message was sent by Atlassian JIRA

View raw message