hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-16372) References to previous cell in read path should be avoided
Date Mon, 22 Aug 2016 12:28:20 GMT

     [ https://issues.apache.org/jira/browse/HBASE-16372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ramkrishna.s.vasudevan updated HBASE-16372:
-------------------------------------------
    Attachment: HBASE-16372_1.patch

Patch that solves the problem in both read and write path.
In Write path
-> all writers have been made an impl of CellSink.
-> CellSink has append() and updateState() API. Better name may be needed. This API allows
all writers to update their states. So the compactor calling updateState() will trickle to
the StorefileWriter, HFileWriter and bloom writers. So every one can take a copy of the lastCell
and other cell references that it has.
-> The check in compactor that sees if the shippedCallLimit is reached has been moved outside
the for loop for the cells retrieved.
It is mainly to solve the problem that happens in this following case
-> Suppose the scanner.next() has retrieved 'row1' and the scanner has moved on to the
new block for 'row2' but the current next() has retrieved 10 cells for row1. 
-> So when we check if the shippedLimit has reached while iterating these 10 cells we call
shipped which mean it would have evicted the blocks containing row1.
-> Now for the next scanner.next() call the lastCell in write path should have been the
last Cell of row1 but when row2's first cell comes in there is a chance that the block containing
row1's last cell is evicted (because that is the previous block) then we end up in the problem
of corrupting the data referred by lastCell.
-> So if we move the shipped() call after the current batch of next() is done, we are sure
that updateState will take a copy of the writer's state and now even if we move across blocks
the state of all writers are independent of the blocks backing their cells. 

> References to previous cell in read path should be avoided
> ----------------------------------------------------------
>
>                 Key: HBASE-16372
>                 URL: https://issues.apache.org/jira/browse/HBASE-16372
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Scanners
>    Affects Versions: 2.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16372_1.patch, HBASE-16372_testcase.patch, HBASE-16372_testcase_1.patch
>
>
> Came as part of review discussion in HBASE-15554. If there are references kept to previous
cells in the read path, with the Ref count based eviction mechanism in trunk, then chances
are there to evict a block backing the previous cell but the read path still does some operations
on that garbage collected previous cell leading to incorrect results.
> Areas to target
> -> Storescanner
> -> Bloom filters (particularly in compaction path)
> Thanks to [~anoop.hbase] to point out this in bloomfilter path. But we found it could
be in other areas also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message