hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anastasia Braginsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16421) Introducing the CellChunkMap as a new additional index variant in the MemStore
Date Tue, 20 Dec 2016 12:01:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764042#comment-15764042
] 

Anastasia Braginsky commented on HBASE-16421:
---------------------------------------------

A collective answer to part of the issues raised after the road-map publishing:
--------------------------------------------------------------------------------------------------------------
[~ram_krish]:

bq. We may need the new type of Cell which has the chunk id in it?

This is a possibility. We may have ChunkCell and HeapCell derived from Cell. What about putting
the Chunk ID first integer on each chunk’s byte buffer? Then each cell that knows its offset
and byte buffer can just read it from there and return. The Cell that has no underline MSLAB
chunk can return -1 as Chunk ID. What do you think?

bq. We have an internal branch which was doing the Pipeline flushing and creating n number
of segments per snapshot. I could use that for now to test this. But if you need to test in
latest trunk - can you prepare a patch with CellChunkMap and integrate it with the current
trunk? I can give some patches on the #2 subtask for creating chunk id and having a cell with
chunk id.
Atleast from our earlier reports one thing is sure that we do create garbage during flush
for the cell creation but the overall impact of GC was much better. So I think we are benefited
there, but with the scan perf I think we have not done any tests. For now I can do it with
our internal branch but not on latest trunk.

It is OK that your evaluation will not be on the latest trunk what important is that chunks
will be off-heap. To integrate CellChunkMap into the current trunk is all what need to be
done in the task number 2, not a small issue, better not to do it as a prerequisite for the
prerequisite. I think your patch should be good enough if it uses off-heap. When you say:
“I can give some patches on the #2 subtask for creating chunk id and having a cell with
chunk id” do you mean #2 among prerequisites or #2 among road-map tasks? I should actually
number them anyhow different :)

--------------------------------------------------------------------------------------------------------------
[~stack]:

bq. Sorry... prob. w/ upserted cells is? Why would they not be allocated on MSLAB?

Our last meeting we talked about cells upserted/updated by the append/increment operations,
which are not allocated on MSLAB. Generally any cell (small enough to fit the regular chunk)
that are not allocated on the MSLAB, although generally MSLAB is enabled. 

bq. Do we think these allocations long-lived? That they will migrate to permanent heap?

The live length of those chunks depends on the live length of the cell for which this variable-size
chunk is allocated. Under “permanent heap” do you mean the JVM’s non-heap Permanent
Generation area? If so, then I do not think something allocated dynamically can ever move
to permanent heap. It should be only for JVM’s metadata and statics. But may be I am missing
something.

--------------------------------------------------------------------------------------------------------------
[~anoop.hbase]:

bq. A way to flush (to disk) chunk mapped segment directly with NO need to again make on heap
Cell objects.. This is going to a big change I guess. The entire flush path work based on
a scanner and that path need Cells.

Generally I agree it would be better to flush without creating Cell objects. But if this is
a critical item, then how all other scans performance should be? I mean, after all, flush
uses the same scan as others. All those paths need Cells and after all the flush-scan is less
frequent I think. If we generally think we need “A way to *scan* chunk mapped segment directly
with NO need to again make on heap Cell objects”, then this is a big issue indeed. This
is why we need scan evaluation and if the impact is big, we need to rethink the entire issue
again.

bq. Same way as above for the in memory compaction of 1+ chunk mapped segments.

Please pay attention that we do not plan to do memory compaction (EAGER one) when CellChunkMap
segments are used. CellChunkMap must go with MSLAB and In-Memory-Compaction must go without
MSLAB...

> Introducing the CellChunkMap as a new additional index variant in the MemStore
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-16421
>                 URL: https://issues.apache.org/jira/browse/HBASE-16421
>             Project: HBase
>          Issue Type: Umbrella
>            Reporter: Anastasia Braginsky
>         Attachments: CellChunkMapRevived.pdf, IntroductiontoNewFlatandCompactMemStore.pdf
>
>
> Follow up for HBASE-14921. This is going to be the umbrella JIRA to include all the parts
of integration of the CellChunkMap to the MemStore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message