hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangduo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12078) Missing Data when scanning using PREFIX_TREE DATA-BLOCK-ENCODING
Date Wed, 24 Sep 2014 08:51:34 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146080#comment-14146080

zhangduo commented on HBASE-12078:

Thank you,ramkrishna.s.vasudevan. Sorry for my poor english, I will try my best to explain
the fix.

In RowNodeReader.whichFanNode,negative return value of Bytes.unsignedBinarySearch means
the -(insertPosition+1) where insertPosition is for the whole block, so -fanIndexInBlock -
1 - fanOffset will be the insertPosition of fan, then plus 1 and negative, the return value
of this function should be -(-fanIndexInBlock - 1 - fanOffset + 1) = fanIndexInBlock + fanOffset.

return fanIndexInBlock + fanOffset + 1 is wrong because the following two situation both return
 1. find a fan at position 0
 2. do not find a fan and the insert position is 0
and this will result to a wrong followFan operation.

In PrefixTreeArraySearcher.compareToCurrentToken, the problem is that, some node may have
token length 0 if it is a single byte node(I think the byte is stored at its parent's fan?),
so sometimes we do not have a change to run the i >= key.getRowLength() check even if we
reach the end of the row key part of the key we want to search.

My testcase is used to reproduce these two problems, but after patching for the two problems,
we found some testcase of prefix-tree went wrong, and we found another bug.

In PrefixTreeArraySearcher.fixRowFanMissReverse, original implementation will always find
previous row when insertPosition is 0. But if currentRowNode represent a row key(hasOccurrences),
then the current row is the first row that less than the row we want to search, not the row
before current row.

> Missing Data when scanning using PREFIX_TREE DATA-BLOCK-ENCODING
> ----------------------------------------------------------------
>                 Key: HBASE-12078
>                 URL: https://issues.apache.org/jira/browse/HBASE-12078
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions:
>         Environment: CentOS 6.3
> hadoop 2.5.0(hdfs)
> hadoop 2.2.0(hbase)
> hbase
> sun-jdk 1.7.0_67-b01
>            Reporter: zhangduo
>         Attachments: prefix_tree_error.patch
> our row key is combined with two ints, and we found that sometimes when we using only
the first int part to scan, the result returned may missing some rows. But when we dump the
whole hfile, the row is still there.
> We have written a testcase to reproduce the bug. It works like this:
> put 1-12345
> put 12345-0x01000000
> put 12345-0x01010000
> put 12345-0x02000000
> put 12345-0x02020000
> put 12345-0x03000000
> put 12345-0x03030000
> put 12345-0x04000000
> put 12345-0x04040000
> flush memstore
> then scan using 12345,the returned row key will be 12345-0x20000000(12345-0x10000000

This message was sent by Atlassian JIRA

View raw message