hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anastasia Braginsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16438) Create a cell type so that chunk id is embedded in it
Date Mon, 03 Apr 2017 09:22:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953177#comment-15953177

Anastasia Braginsky commented on HBASE-16438:

bq. What specific question in RB are you looking out for? 

OK. I will write here the questions that bother me and I don't see responses:
1.In ByteBufferChunkCell, please explain me why to add this new class? Why can not the existing
BBKV just have a new method - getChunkId() -  to return the chunk id in the 0th offset of
the backing BB?
2. In ByteBufferKeyValue or in MSLAB or anywhere else, please add constant saying what is
the size in bytes of the ChunkCell or what I call cell-representation (chunkId + offset +
length + seqId), so I can use it later.
I will review the existing patch once again

bq. ChunkId is per ByteBuffer backing the chunk. I can change the chunkId to be an int.

You got it yourself, I also thought so for a moment. I am talking about ChunkID of where each
cell is located, which is saved per cell. 
Please do change chunkID to int, but check for overflow (at least log some error). 
I believe we should strive to decrease number of bytes the cell representation is taking,
because this is the reason why are we doing the CellChunkMap...

bq. My Q was, this Cell meta data (ChunkId, offset, length) also we planned to write to chunks.
So what is the difference? In this chunk or that chunk?

Do you mean the seqID is going to be written in index-chunk only and is not going to be written
in the main-chunk, holding key, value and etc.? So no duplication? Are you sure? If so, then
already little better, but still I would like to keep the Cell meta data smaller.
The smaller the Cell meta data is (hopefully only chunkId, offset, length and only 12 bytes)
the less is the meta-data-overhead per cell is and the more we can squeeze into single index-chunk
(CellChunkMap). The smaller CellChunkMap is we all enjoy the locality for scans and the binary
search can hit the processor-cache easily.

bq. The only thing is we should go with fixed 8 bytes for that. 

This is not a desired situation. We are increasing from 12 bytes to 20 bytes, almost twice...
We should not do it unless it is very very necessary...

bq. So now if you are going to write the seqId in the BB backing every cell, then the seqId
as the state variable is not needed at all and hence you may need a new cell representation
for it. 

OK. So lets have a new cell representation.

bq. Otherwise we should still go with it and use the seqID as a caching value in addition
to having it in the BB. 

Why to have the duplication of the same?

> Create a cell type so that chunk id is embedded in it
> -----------------------------------------------------
>                 Key: HBASE-16438
>                 URL: https://issues.apache.org/jira/browse/HBASE-16438
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: HBASE-16438_1.patch, HBASE-16438_3_ChunkCreatorwrappingChunkPool.patch,
HBASE-16438_4_ChunkCreatorwrappingChunkPool.patch, HBASE-16438_8_ChunkCreatorwrappingChunkPool_withchunkRef.patch,
HBASE-16438_9_ChunkCreatorwrappingChunkPool_withchunkRef.patch, HBASE-16438.patch, MemstoreChunkCell_memstoreChunkCreator_oldversion.patch,
> For CellChunkMap we may need a cell such that the chunk out of which it was created,
the id of the chunk be embedded in it so that when doing flattening we can use the chunk id
as a meta data. More details will follow once the initial tasks are completed. 
> Why we need to embed the chunkid in the Cell is described by [~anastas] in this remark
over in parent issue https://issues.apache.org/jira/browse/HBASE-14921?focusedCommentId=15244119&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15244119

This message was sent by Atlassian JIRA

View raw message