accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3646) Duplicate entries when iterator emits entries past seek() range
Date Mon, 27 Apr 2015 16:14:39 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514372#comment-14514372
] 

ASF GitHub Bot commented on ACCUMULO-3646:
------------------------------------------

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/33#discussion_r29161718
  
    --- Diff: docs/src/main/asciidoc/chapters/iterator_design.txt ---
    @@ -145,8 +146,16 @@ alter the internal state of the Iterator.
     
     These methods simply return the current Key-Value pair for this iterator. If `hasTop`
returns true,
     both of these methods should return non-null objects. If `hasTop` returns false, it is
undefined
    -what these methods should return. Multiple calls to these methods should not alter the
state
    -of the Iterator like `hasTop`.
    +what these methods should return. Like `hasTop`, multiple calls to these methods should
not alter 
    +the state of the Iterator.
    +
    +When saving a Key or Value from a source iterator's `getTopKey` or `getTopValue` methods
    +for use after calling `next` on the source iterator (e.g., when cacheing keys or values
    +from the source iterator), it is important to copy the Key or Value into a new object

    +because the source iterator may reuse the Key or Value objects for performance reasons.
    --- End diff --
    
    Great, thanks Dylan. My memory isn't what it should be :).
    
    > I think some iterators do this frequently
    
    I took a gander at the "user" iterators, and I was a little surprised to see as many iterators
doing a copy in some form or another. Perhaps I'm being overly sensitive on the performance
concerns. Maybe my concern could be reworded: are there cases in which a user doesn't need
to copy the Key we could outline?
    
    "An iterator which doesn't modify the Key from its source doesn't need to copy it" ? This
would catch cases like value transformations. What do you think? Is that a safe clarification
to make?


> Duplicate entries when iterator emits entries past seek() range
> ---------------------------------------------------------------
>
>                 Key: ACCUMULO-3646
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3646
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, mini, tserver
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 14.04, Accumulo 1.6.1, Hadoop 2.6.0, Zookeeper 3.4.6
>            Reporter: Dylan Hutchison
>            Assignee: Dylan Hutchison
>            Priority: Minor
>             Fix For: 1.7.0
>
>
> The SortedKeyValueIterator's seek() method documents that an iterator may return keys
past the range passed to seek().  However, an iterator set at scan-time that returns values
past the range passed to seek() will return those keys multiple times if the client uses a
BatchScanner.  This does not occur when the client uses a Scanner. This has nothing to do
with the VersioningIterator. This has nothing to do with the entries actually in the table.
Also affects MiniAccumulo.
> If this is intended, we should update the SortedKeyValueIterator seek() documentation
with a warning that returning keys past the seek() range may result in a client seeing duplicate
keys. If this is not intended, then it is a bug.
> Test code: See [InjectTest|https://github.com/Accla/d4m_api_java/blob/master/src/test/java/edu/mit/ll/graphulo/InjectTest.java]
> * method {{testInjectOnScan_Empty}} fails because it uses a BatchScanner
> * method {{testInjectOnScan_Empty_Reg}} passes because it uses a Scanner
> In these methods, the [InjectIterator|https://github.com/Accla/d4m_api_java/blob/master/src/main/java/edu/mit/ll/graphulo/InjectIterator.java]
emits entries that go beyond the seek() range.  We confirm what is going on by placing a [DebugIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/DebugIterator.html]
right after.
> Logs when using the BatchScanner:
> notice that the "m1" row is returned twice:
> {noformat}
> 015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class edu.mit.ll.graphulo.InjectIterator:
init on scope scan
> 2015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class edu.mit.ll.graphulo.InjectIterator:
init on scope scan
> 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: init(edu.mit.ll.graphulo.InjectIterator@e9fe846,
{}, org.apache.accumulo.tserver.TabletIteratorEnvironment@b99fd03)
> 2015-03-05 06:05:34,771 [iterators.DebugIterator] DEBUG: 0x516E9F1F seek((-inf,f%00;
: [] 9223372036854775807 false), [], false)
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() -->
a1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() -->
a1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() -->
1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() -->
c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() -->
c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() -->
c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() -->
1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() -->
m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() -->
m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() -->
m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() -->
1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false
> 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: init(edu.mit.ll.graphulo.InjectIterator@2528a1f1,
{}, org.apache.accumulo.tserver.TabletIteratorEnvironment@244a532a)
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA seek([f%00; : []
9223372036854775807 false,+inf), [], false)
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() -->
m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() -->
m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() -->
m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopValue() -->
1
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA next()
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message