Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1BB7C179A8 for ; Sun, 26 Apr 2015 03:18:39 +0000 (UTC) Received: (qmail 72842 invoked by uid 500); 26 Apr 2015 03:18:38 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 72790 invoked by uid 500); 26 Apr 2015 03:18:38 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 72779 invoked by uid 99); 26 Apr 2015 03:18:38 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 26 Apr 2015 03:18:38 +0000 Date: Sun, 26 Apr 2015 03:18:38 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-3646) Duplicate entries when iterator emits entries past seek() range MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512850#comment-14512850 ] ASF GitHub Bot commented on ACCUMULO-3646: ------------------------------------------ Github user ctubbsii commented on the pull request: https://github.com/apache/accumulo/pull/33#issuecomment-96316635 If you wish, you could close and re-submit, but it doesn't really matter. Whoever handles the merge can make sure it gets cherry-picked onto the proper branch. > Duplicate entries when iterator emits entries past seek() range > --------------------------------------------------------------- > > Key: ACCUMULO-3646 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3646 > Project: Accumulo > Issue Type: Bug > Components: client, mini, tserver > Affects Versions: 1.6.1 > Environment: Ubuntu 14.04, Accumulo 1.6.1, Hadoop 2.6.0, Zookeeper 3.4.6 > Reporter: Dylan Hutchison > Assignee: Dylan Hutchison > Priority: Minor > Fix For: 1.7.0 > > > The SortedKeyValueIterator's seek() method documents that an iterator may return keys past the range passed to seek(). However, an iterator set at scan-time that returns values past the range passed to seek() will return those keys multiple times if the client uses a BatchScanner. This does not occur when the client uses a Scanner. This has nothing to do with the VersioningIterator. This has nothing to do with the entries actually in the table. Also affects MiniAccumulo. > If this is intended, we should update the SortedKeyValueIterator seek() documentation with a warning that returning keys past the seek() range may result in a client seeing duplicate keys. If this is not intended, then it is a bug. > Test code: See [InjectTest|https://github.com/Accla/d4m_api_java/blob/master/src/test/java/edu/mit/ll/graphulo/InjectTest.java] > * method {{testInjectOnScan_Empty}} fails because it uses a BatchScanner > * method {{testInjectOnScan_Empty_Reg}} passes because it uses a Scanner > In these methods, the [InjectIterator|https://github.com/Accla/d4m_api_java/blob/master/src/main/java/edu/mit/ll/graphulo/InjectIterator.java] emits entries that go beyond the seek() range. We confirm what is going on by placing a [DebugIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/DebugIterator.html] right after. > Logs when using the BatchScanner: > notice that the "m1" row is returned twice: > {noformat} > 015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class edu.mit.ll.graphulo.InjectIterator: init on scope scan > 2015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class edu.mit.ll.graphulo.InjectIterator: init on scope scan > 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: init(edu.mit.ll.graphulo.InjectIterator@e9fe846, {}, org.apache.accumulo.tserver.TabletIteratorEnvironment@b99fd03) > 2015-03-05 06:05:34,771 [iterators.DebugIterator] DEBUG: 0x516E9F1F seek((-inf,f%00; : [] 9223372036854775807 false), [], false) > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() --> 1 > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next() > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() --> 1 > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next() > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() --> 1 > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next() > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false > 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: init(edu.mit.ll.graphulo.InjectIterator@2528a1f1, {}, org.apache.accumulo.tserver.TabletIteratorEnvironment@244a532a) > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA seek([f%00; : [] 9223372036854775807 false,+inf), [], false) > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopValue() --> 1 > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA next() > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)