hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-16438) Create a cell type so that chunk id is embedded in it
Date Fri, 24 Mar 2017 08:54:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15940016#comment-15940016
] 

ramkrishna.s.vasudevan edited comment on HBASE-16438 at 3/24/17 8:54 AM:
-------------------------------------------------------------------------

I have a way to solve this problem. LEt's discuss before I put up the patch. Most of the other
RB comments are fixed.
-> Now since we need to know if the chunk is from pool or not - the Chunk will have a boolean
indicating whther the chunk was created for the pool. Say we have isFromPool() will return
true for those chunks.
-> Every chunk will have an AtomicInteger ref count.
-> When the MSLAB does a copyToChunkCell - where we know that the cell has to have a chunk(comes
out of chunkCreator) we do an increment of the refCount.
-> Now in the MemstoreImpl when we do getCellSet().add() ( we need to have a new API in
CellSet which actually returns the cell that was already there in the CSLM which is returned
by CSLM.put() returns. Now we only have cellSet#add() which return boolean).
-> On this returned cell (which is the actual duplicate cell) we get the chunkId from the
Cell. remember we now have a BbChunkCell which can give the chunkid from the 0th offset.
-> Use this chunkId to actually do a decrement of the reference count of this chunk. For
this we need a decrementChunkRefCount in MSLAB interface. I think it is valid because MSLAB
impl is nothing but Chunks.
-> Now on doing this decrementChunkRefCount  , we could check if the result is now 0 and
if so just remove that chunk from the chunkCreator map. So by this way we are making sure
that the reference to the chunk is released immediately.
-> Things to note is that in case the chunk is from Pool this increment/decrement will
not have any impact. This will impact only when we have ondemand chunks.
-> There is an atomic ref count operation happening now which may add on to the write path
overhead. May be need to see the impact. but remember this is going to happen only if there
are lot of duplicates like in HBASE-16195. In a normal case this should not be a problem because
the CSLM#put() is going to return a null as there is no duplicate and so there are no such
problems. And infact in such a case the GC issue mentioned in HBASE-16195 will not happen
as all the chunks are needed till the MSLAB is closed.
Thoughts!!!


was (Author: ram_krish):
I have a way to solve this problem. LEt's discuss before I put up the patch. Most of the other
RB comments are fixed.
-> Now since we need to if the chunk is from pool or not - the Chunk will have a boolean
indicating whther the chunk was created for the pool. Say we have isFromPool() will return
true for those chunks.
-> Every chunk will have an AtomicInteger ref count.
-> When the MSLAB does a copyToChunkCell - where we know that the cell has to have a chunk(comes
out of chunkCreator) we do an increment of the refCount.
-> Now in the MemstoreImpl when we do getCellSet().add() ( we need to have a new API in
CellSet which actually returns the cell that was already there in the CSLM which is returned
by CSLM.put() returns. Now we only have cellSet#add() which return boolean).
-> On this returned cell (which is the actual duplicate cell) we get the chunkId from the
Cell. remember we now have a BbChunkCell which can give the chunkid frm the 0th offset.
-> Use this chunkId to actually do a decrement of the reference count of this chunk. For
this we need a decrementChunkRefCount in MSLAB interface. I think it is valid because MSLAB
impl is nothing but Chunks.
-> Now on doing this decrementChunkRefCount  , we could check if the result is now 0 and
if so just remove that chunk from the chunkCreator map. So by this way we are making sure
that the reference to the chunk is released immediately.
-> Things to note is that in case the chunk is from Pool this increment/decrement will
not have any impact. This will impact only when we have ondemand chunks.
-> There is an atomic ref count operation happening now which may add on to the write path
overhead. May be need to see the impact. but remember this is going to happen only if there
are lot of duplicates like in HBASE-16195. In a normal case this should not be a problem because
the CSLM#put() is going to return a null as there is no duplicate and so there are no such
problems. And infact in such a case the GC issue mentioned in HBASE-16195 will not happen
as all the chunks are needed till the MSLAB is closed.
Thoughts!!!

> Create a cell type so that chunk id is embedded in it
> -----------------------------------------------------
>
>                 Key: HBASE-16438
>                 URL: https://issues.apache.org/jira/browse/HBASE-16438
>             Project: HBase
>          Issue Type: Sub-task
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>         Attachments: HBASE-16438_1.patch, HBASE-16438_3_ChunkCreatorwrappingChunkPool.patch,
HBASE-16438_4_ChunkCreatorwrappingChunkPool.patch, HBASE-16438.patch, MemstoreChunkCell_memstoreChunkCreator_oldversion.patch,
MemstoreChunkCell_trunk.patch
>
>
> For CellChunkMap we may need a cell such that the chunk out of which it was created,
the id of the chunk be embedded in it so that when doing flattening we can use the chunk id
as a meta data. More details will follow once the initial tasks are completed. 
> Why we need to embed the chunkid in the Cell is described by [~anastas] in this remark
over in parent issue https://issues.apache.org/jira/browse/HBASE-14921?focusedCommentId=15244119&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15244119



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message