hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harsh J (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HBASE-15235) Inconsistent cell reads over multiple bulk-loaded HFiles
Date Tue, 09 Feb 2016 03:33:18 GMT

     [ https://issues.apache.org/jira/browse/HBASE-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Harsh J resolved HBASE-15235.
    Resolution: Duplicate

Dupe of HBASE-15236

> Inconsistent cell reads over multiple bulk-loaded HFiles
> --------------------------------------------------------
>                 Key: HBASE-15235
>                 URL: https://issues.apache.org/jira/browse/HBASE-15235
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Appy
>            Assignee: Appy
> If there are two bulkloaded hfiles in a region with same seqID and duplicate keys*, get
and scan may return different values for a key.
> More details:
> - one of the rows had 200k+ columns. say row is 'r', column family is 'cf' and column
qualifiers are 1 to 1000.
> - hfiles were split somewhere along that row, but there were a range of columns in both
hfiles. For eg, something like - hfile1: ["",  r:cf:70) and hfile2: [r:cf:40, ....).
> - Between columns 40 to 70, some (not all) columns were in both the files with different
values. Whereas other were only in one of the files.
> In such a case, depending on file size (because we take it into account when sorting
hfiles internally), we may get different values for the same cell (say "r", "cf:50") depending
on what we call: get "r" "cf:50" or get "r" "cf:".
> I have been able to replicate this issue, will post the instructions shortly.
> * not sure how this would happen. These files are small ~50M, nor could i find any setting
for max file size that could lead to splits. Need to investigate more.

This message was sent by Atlassian JIRA

View raw message