Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7695917B95 for ; Mon, 27 Apr 2015 17:32:39 +0000 (UTC) Received: (qmail 4061 invoked by uid 500); 27 Apr 2015 17:32:39 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 4020 invoked by uid 500); 27 Apr 2015 17:32:39 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 4004 invoked by uid 99); 27 Apr 2015 17:32:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Apr 2015 17:32:39 +0000 Date: Mon, 27 Apr 2015 17:32:39 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-3646) Duplicate entries when iterator emits entries past seek() range MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514493#comment-14514493 ] ASF GitHub Bot commented on ACCUMULO-3646: ------------------------------------------ Github user denine99 commented on a diff in the pull request: https://github.com/apache/accumulo/pull/33#discussion_r29169461 --- Diff: docs/src/main/asciidoc/chapters/iterator_design.txt --- @@ -145,8 +146,16 @@ alter the internal state of the Iterator. These methods simply return the current Key-Value pair for this iterator. If `hasTop` returns true, both of these methods should return non-null objects. If `hasTop` returns false, it is undefined -what these methods should return. Multiple calls to these methods should not alter the state -of the Iterator like `hasTop`. +what these methods should return. Like `hasTop`, multiple calls to these methods should not alter +the state of the Iterator. + +When saving a Key or Value from a source iterator's `getTopKey` or `getTopValue` methods +for use after calling `next` on the source iterator (e.g., when cacheing keys or values +from the source iterator), it is important to copy the Key or Value into a new object +because the source iterator may reuse the Key or Value objects for performance reasons. --- End diff -- The statement > "An iterator which doesn't modify the Key from its source doesn't need to copy it" does not sound safe because the base iterators may change the data that the Key object points to in order to reuse the same underlying char[] and the same object. How about this: "An iterator that does not use a Key obtained from `getTopKey()` after calling `next()` on its source iterator (and does not use the underlying char[] of the Key), does not need to copy the data from the Key and may use the original object reference. Ditto for Value." It's a wordy statement but we need all the clauses ~.~ > Duplicate entries when iterator emits entries past seek() range > --------------------------------------------------------------- > > Key: ACCUMULO-3646 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3646 > Project: Accumulo > Issue Type: Bug > Components: client, mini, tserver > Affects Versions: 1.6.1 > Environment: Ubuntu 14.04, Accumulo 1.6.1, Hadoop 2.6.0, Zookeeper 3.4.6 > Reporter: Dylan Hutchison > Assignee: Dylan Hutchison > Priority: Minor > Fix For: 1.7.0 > > > The SortedKeyValueIterator's seek() method documents that an iterator may return keys past the range passed to seek(). However, an iterator set at scan-time that returns values past the range passed to seek() will return those keys multiple times if the client uses a BatchScanner. This does not occur when the client uses a Scanner. This has nothing to do with the VersioningIterator. This has nothing to do with the entries actually in the table. Also affects MiniAccumulo. > If this is intended, we should update the SortedKeyValueIterator seek() documentation with a warning that returning keys past the seek() range may result in a client seeing duplicate keys. If this is not intended, then it is a bug. > Test code: See [InjectTest|https://github.com/Accla/d4m_api_java/blob/master/src/test/java/edu/mit/ll/graphulo/InjectTest.java] > * method {{testInjectOnScan_Empty}} fails because it uses a BatchScanner > * method {{testInjectOnScan_Empty_Reg}} passes because it uses a Scanner > In these methods, the [InjectIterator|https://github.com/Accla/d4m_api_java/blob/master/src/main/java/edu/mit/ll/graphulo/InjectIterator.java] emits entries that go beyond the seek() range. We confirm what is going on by placing a [DebugIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/DebugIterator.html] right after. > Logs when using the BatchScanner: > notice that the "m1" row is returned twice: > {noformat} > 015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class edu.mit.ll.graphulo.InjectIterator: init on scope scan > 2015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class edu.mit.ll.graphulo.InjectIterator: init on scope scan > 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: init(edu.mit.ll.graphulo.InjectIterator@e9fe846, {}, org.apache.accumulo.tserver.TabletIteratorEnvironment@b99fd03) > 2015-03-05 06:05:34,771 [iterators.DebugIterator] DEBUG: 0x516E9F1F seek((-inf,f%00; : [] 9223372036854775807 false), [], false) > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() --> 1 > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next() > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() --> 1 > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next() > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() --> 1 > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next() > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false > 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false > 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: init(edu.mit.ll.graphulo.InjectIterator@2528a1f1, {}, org.apache.accumulo.tserver.TabletIteratorEnvironment@244a532a) > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA seek([f%00; : [] 9223372036854775807 false,+inf), [], false) > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopValue() --> 1 > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA next() > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false > 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)