hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
Date Mon, 19 Dec 2016 07:13:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760395#comment-15760395

Anoop Sam John commented on HBASE-16421:

There is a diff btw the read path off heaping work and this one.  Read path off heaping, the
start point is the off heap backed BucketCache. The block data bytes are in off heap BBs.
Even before the HBASE-11425 work also,  in read path, we always start with block bytes (64
KB default size).  The data can be read now from HDFS or from L1 or L2 cache.. In all cases
there are plain bytes.  The read path first reads and creates Cells out of this.. Pls remember
that we wont do any bytes copy for this. (Specially after the 11425 work).  Only thing is
Cell POJOs are created wrapping the data bytes in on heap or off heap area.  So this is bot
avoidable at all.  Before and after the off heaping read path work, this part was/is same.

Here the diff is this..  As of now, we have Cell object always in Segments. The segments can
be flattened or not. Depending on that the cells might be in CSLM or in Cell[].  But Cell
pojos are there. When scan/read comes we serve back those Cell objects. When there is a need
for in memory compaction or on disk flush, we will have a scanner associated with it and that
reads out the Cells. You can see it is just retrieval or iteration of the existing POJOs happens.
 When we have CellChunkMap it will be diff.. We will get rid of Cell objects as such.  What
we have instead is some index data (ChunkId + offset + length). And for every Cell, we will
have to convert this index into a POJO cell object. Well for the actual Scan/Get working on
this Segment, this is unavoidable. We must work with Cells then.  But doing the same for even
and in memory compaction/on disk flush will be too much..  We got rid of many java objects
(Cells), doing the flattening to ChunkMap and we in between, create those objects again! 
This will cause so many garbage and affect GC.
I hope am explaining the diff and impact more clean now. :-) Sorry if I was not clear earlier.

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> ------------------------------------------------------------------------------
>                 Key: HBASE-16421
>                 URL: https://issues.apache.org/jira/browse/HBASE-16421
>             Project: HBase
>          Issue Type: Umbrella
>            Reporter: Anastasia Braginsky
>         Attachments: CellChunkMapRevived.pdf, IntroductiontoNewFlatandCompactMemStore.pdf
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include all the parts
of integration of the CellChunkMap to the MemStore.

This message was sent by Atlassian JIRA

View raw message