hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anastasia Braginsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14921) Memory optimizations
Date Sat, 16 Apr 2016 09:47:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15244119#comment-15244119
] 

Anastasia Braginsky commented on HBASE-14921:
---------------------------------------------

[~yuzhihong@gmail.com], thanks for taking the look!

bq. Can you explain in bit more detail on the savings ?

Flattenning just means replacing the CellSet from one based on ConcurrentSkipListMap to one
based on CellArrayMap. CellArayMap is a new name for CellBlockObjectArray and it uses less
overhead (metadata) per cell than ConcurrentSkipListMap. I am quoting [~anoop.hbase] below:
{quote}
an entry added to CSLM (Cell object) will have ~100 bytes overhead per cell.
The Cell[] way of CellBlock (CellBlockObjectArray) will have per Cell overhead of 48 bytes
{quote}

\\
bq. What's plan for flattening to CellChunkMap w.r.t. getting the chunk Id ?

The following answer also answers the questions raised by [~stack]

bq. What is this "...currently impossible to get the chunk ID out of already created cell
metadata"

\\
In CellChunkMap we save a kind of reference to Cell using three integers (possible to deal
with 2, but 3 for now). Assuming that all the data of Cell A is saved on Chunk C, in CellChunkMap
we save the following per cell:
1.	Reference to C (some possibility to access the byte array of Chunk C)
2.	Offset from the beginning of Chunk C
3.	Length of the Cell A on C
The problem is in 1. In Java we can not have a pointer/reference/address of and object. To
resolve that, we added an ID for each Chunk, which is created in the MemStoreChunkPool. In
addition we added a mapping from Chunk IDs to Chunks references. So in 1 we save the Chunk
ID and translate it to Chunk reference when we need to access the Cell data. This is OK when
we create CellChunkMap from the scratch. 

But in case of flattening, we have an exisiting segment with MSLAB and ConcurrentSkipListMap
and we do not want to copy the data in MSLAB. So as it is now, we can not just translate the
ConcurrentSkipListMap to CellChunkMap, because we do not know the Chunk IDs of the Cells.
But we can translate ConcurrentSkipListMap to CellArrayMap, which already reduces some metadata
overhead.

In order to allow translation to CellChunkMap we need the Cells to know where they are storred
and their Chunk IDs. It is quite a big change and it is planned to be done after performance
evaluation phase.


> Memory optimizations
> --------------------
>
>                 Key: HBASE-14921
>                 URL: https://issues.apache.org/jira/browse/HBASE-14921
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: Eshcar Hillel
>            Assignee: Anastasia Braginsky
>         Attachments: CellBlocksSegmentInMemStore.pdf, CellBlocksSegmentinthecontextofMemStore(1).pdf,
HBASE-14921-V01.patch, HBASE-14921-V02.patch, HBASE-14921-V03.patch, IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Memory optimizations including compressed format representation and offheap allocations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message