Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AE48C18B6B for ; Thu, 27 Aug 2015 06:44:46 +0000 (UTC) Received: (qmail 65281 invoked by uid 500); 27 Aug 2015 06:44:46 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 65234 invoked by uid 500); 27 Aug 2015 06:44:46 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 65218 invoked by uid 99); 27 Aug 2015 06:44:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Aug 2015 06:44:46 +0000 Date: Thu, 27 Aug 2015 06:44:46 +0000 (UTC) From: "Hadoop QA (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-14269) FuzzyRowFilter omits certain rows when multiple fuzzy keys exist MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716172#comment-14716172 ] Hadoop QA commented on HBASE-14269: ----------------------------------- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752648/HBASE-14269-0.98.patch against 0.98 branch at commit 56890d9fe148dd192520fab349a66aa3f688e232. ATTACHMENT ID: 12752648 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 23 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3869 checkstyle errors (more than the master's current 3868 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15284//testReport/ Release Findbugs (version 2.0.3) warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15284//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15284//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15284//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15284//console This message is automatically generated. > FuzzyRowFilter omits certain rows when multiple fuzzy keys exist > ---------------------------------------------------------------- > > Key: HBASE-14269 > URL: https://issues.apache.org/jira/browse/HBASE-14269 > Project: HBase > Issue Type: Bug > Components: Filters > Reporter: hongbin ma > Assignee: hongbin ma > Fix For: 2.0.0, 1.2.0, 1.3.0, 0.98.15, 1.0.3, 1.1.3 > > Attachments: HBASE-14269-0.98.patch, HBASE-14269-branch-1.patch, HBASE-14269-v1.patch, HBASE-14269-v2.patch, HBASE-14269.patch > > > https://issues.apache.org/jira/browse/HBASE-13761 introduced a RowTracker in FuzzyRowFilter to avoid performing getNextForFuzzyRule() for each fuzzy key on each getNextCellHint() by maintaining a list of possible row matches for each fuzzy key. The implementation assumes that the prepared rows will be matched one by one, so it removes the first row in the list as soon as it is used. However, this approach may lead to omitting rows in some cases: > Consider a case where we have two fuzzy keys: > 1?1 > 2?2 > and the data is like: > 000 > 111 > 112 > 121 > 122 > 211 > 212 > when the first row 000 fails to match, RowTracker will update possible row matches with cell 000 and fuzzy keys 1?1,2?2. This will populate RowTracker with 101 and 202. Then 101 is popped out of RowTracker, hint the scanner to go to row 101. The scanner will get 111 and find it is a match, and continued to find that 112 is not a match, getNextCellHint will be called again. Then comes the bug: Row 101 has been removed out of RowTracker, so RowTracker will jump to 202. As you see row 121 will be omitted, but it is actually a match for fuzzy key 1?1. > I will illustrate the bug by adding a new test case in TestFuzzyRowFilterEndToEnd. Also I will provide the bug fix in my patch. The idea of the new solution is to maintain a priority queue for all the possible match rows for each fuzzy key, and whenever getNextCellHint is called, the elements in the queue that are smaller than the parameter currentCell will be updated(and re-insert into the queue). The head of queue will always be the "Next cell hint". -- This message was sent by Atlassian JIRA (v6.3.4#6332)