hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16372) References to previous cell in read path should be avoided
Date Fri, 12 Aug 2016 13:17:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418798#comment-15418798
] 

ramkrishna.s.vasudevan commented on HBASE-16372:
------------------------------------------------

Will comment on the findings till now. Will upload patch later. These are my findings, I will
prepare a patch early next week (its a long weekend here) and 
see if am missing some other area too.
StoreScanner
============
The 'prevCell' is getting used for ensuring we have moved on to the next Cell lexographically.
I would think though this can be removed this check using prevCell is needed atleast for the
Reverse scan cases. So better we will not remove 'prevCell'.
The problem of 'prevCell' referring to the last cell in the previous block and those blocks
getting evicted  happens when
a)the size limit is reached
b) the batch limit is reached
c) the store offset limit is reached.
In other words code wise where ever we break the while loop 'LOOP'.
Why only in the above 3 cases is a problem because in all other cases whether we INCLUDE the
current cell or SEEK_TO_NEXT_ROW or SEEK_TO_NEXT_COL or SKIP
we always tend to do a heap.peek() after either heap.next() or heap.seek().
Internally storeFileScanner returns the current 'cur' cached cell for every peek() call.
If a heap.seek() had happened the seek call would have either fetched a 'cur' cell in the
current block or in a new block. So when StoreScanner does a 
heap.peek() this 'cur' cell is fetched and that is what we set to the prevCell. So when the
next() continues like this to either INCLUDE or SKIP or again SEEK
we are sure that the current block is in use and if the scan is completed we can safely return
the previous blocks thus accumulated.
Same is the case with StoreScanner calling heap.next() because StorefileScanner.next() updates
'cur' ref with the next cell (which can be from a new block).
So when StoreScanner.peek() is called this new cell from the new blocks is fetched and the
same is updated as the prevCell. So when the StoreScanner.next()
completes its iteration we are safe to return all the previous blocks.

But when the 3 conditions mentioned above is reached what happens is that we have either done
a heap.seek() or heap.next() but seeing the above conditions
we abruptly break the loop and return the results. Now for eg take the case of a break that
happens after heap.next(). The StorefileScanner.next() has fetched
a new cell from the new block and so the previous blocks are added to the return block list
and when the scanned results are returned for this current call
to next() we return all the previous blocks. At this point 'prevCell' ref is pointing to a
cell that was part of these previous blocks. This is where
the issue could happen.

So in all the above conditions before breaking the loop we could safely do a copy(clone) of
the 'prevCell' and update the 'prevCell' ref to the newly created cell.

Compaction write flow
====================
In compaction write flow we read from the underlying files and keep writing those cells one
by one to a new file. In case of SHARED memory we tend to return the
blocks when a certain threshold of bytes have been written to new files in order to avoid
OOME and to mimic the normal scan behaviour.


Because we try to return the blocks in between the compation writes the following areas needs
to be addressed
a)Writer's state variables
b)bloom filter's state variables

Writer state variables
=====================
Writer state variables are the ones like 'lastCellInPreviousBlock',' firstCellinBlock', 'lastCell'
etc which needs to be handled so that the returning of blocks do
not corrupt the data behind these state variables.

Bloom filter state variables
===========================
The bloom filter variable 'lastCell' should not be cached for the above reason. The 'lastCell'
is needed in bloom area to compare if a new key has arrived
as per the BloomType.

Currently the Writer that is created in the Compaction path is done using the CellSink. This
Writer creates all type of writes for the default compaction,
the stripe compaction, the data tiered compaction and for the Mob compaction.
So since the compactor framework is the same for all these and we call intermediate shipped()
call for any of the above compaction, we could add an API
in CellSink and say updateWriterState() (I am sure we need a better name). This will call
similar APIs in all the underlying writers.

When the call moves to the StoreFileWriter (this will be there in all the compaction writers
because this is where the actual cell is written) - here we have
3 writes per store file writer.
a) The HFileWriter
b) the General bloom writer
c) the delete bloom writer

So this new API updateWriterState() will also be in StoreFileWriter because it also implements
CellSink. Now this call will actually allow us to 
do a clone of all such cells which we use as references. 

Now the point here is that in future, if we say one of the cell that comes in a write path
is ByteBuffered back, there are chances that the references that
the code path holds may get cloned to onheap cells. This should be fine as we are going to
do this only in compaction but the point is that in future 
no code should go with the assumption that if one cell is ByteBuffered back then all cells
in the code path will be bytebuffered backed.


> References to previous cell in read path should be avoided
> ----------------------------------------------------------
>
>                 Key: HBASE-16372
>                 URL: https://issues.apache.org/jira/browse/HBASE-16372
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Scanners
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> Came as part of review discussion in HBASE-15554. If there are references kept to previous
cells in the read path, with the Ref count based eviction mechanism in trunk, then chances
are there to evict a block backing the previous cell but the read path still does some operations
on that garbage collected previous cell leading to incorrect results.
> Areas to target
> -> Storescanner
> -> Bloom filters (particularly in compaction path)
> Thanks to [~anoop.hbase] to point out this in bloomfilter path. But we found it could
be in other areas also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message