Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB25D11483 for ; Wed, 30 Jul 2014 23:51:41 +0000 (UTC) Received: (qmail 11312 invoked by uid 500); 30 Jul 2014 23:51:40 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 11213 invoked by uid 500); 30 Jul 2014 23:51:40 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 11080 invoked by uid 99); 30 Jul 2014 23:51:40 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Jul 2014 23:51:40 +0000 Date: Wed, 30 Jul 2014 23:51:40 +0000 (UTC) From: "ramkrishna.s.vasudevan (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-11591: ------------------------------------------- Attachment: HBASE-11591.patch Attaching a patch to get feedback. Checking on some more corner cases. > Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file > --------------------------------------------------------------------------------------------------------------------------- > > Key: HBASE-11591 > URL: https://issues.apache.org/jira/browse/HBASE-11591 > Project: HBase > Issue Type: Bug > Affects Versions: 0.99.0 > Reporter: ramkrishna.s.vasudevan > Assignee: ramkrishna.s.vasudevan > Priority: Critical > Fix For: 0.99.0 > > Attachments: HBASE-11591.patch, TestBulkload.java > > > See discussion in HBASE-11339. > When we have a case where there are same KVs in two files one produced by flush/compaction and the other thro the bulk load. > Both the files have some same kvs which matches even in timestamp. > Steps: > Add some rows with a specific timestamp and flush the same. > Bulk load a file with the same data.. Enusre that "assign seqnum" property is set. > The bulk load should use HFileOutputFormat2 (or ensure that we write the bulk_time_output key). > This would ensure that the bulk loaded file has the highest seq num. > Assume the cell in the flushed/compacted store file is > row1,cf,cq,ts1, value1 and the cell in the bulk loaded file is > row1,cf,cq,ts1,value2 > (There are no parallel scans). > Issue a scan on the table in 0.96. The retrieved value is row1,cf1,cq,ts1,value2 > But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. > This is a behaviour change. This is because of this code > {code} > public int compare(KeyValueScanner left, KeyValueScanner right) { > int comparison = compare(left.peek(), right.peek()); > if (comparison != 0) { > return comparison; > } else { > // Since both the keys are exactly the same, we break the tie in favor > // of the key which came latest. > long leftSequenceID = left.getSequenceID(); > long rightSequenceID = right.getSequenceID(); > if (leftSequenceID > rightSequenceID) { > return -1; > } else if (leftSequenceID < rightSequenceID) { > return 1; > } else { > return 0; > } > } > } > {code} > Here in 0.96 case the mvcc of the cell in both the files will have 0 and so the comparison will happen from the else condition . Where the seq id of the bulk loaded file is greater and would sort out first ensuring that the scan happens from that bulk loaded file. > In case of 0.98+ as we are retaining the mvcc+seqid we are not making the mvcc as 0 (remains a non zero positive value). Hence the compare() sorts out the cell in the flushed/compacted file. Which means though we know the lateset file is the bulk loaded file we don't scan the data. > Seems to be a behaviour change. Will check on other corner cases also but we are trying to know the behaviour of bulk load because we are evaluating if it can be used for MOB design. -- This message was sent by Atlassian JIRA (v6.2#6252)