accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2232) Combiners can cause deleted data to come back
Date Tue, 22 Sep 2015 21:33:05 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903477#comment-14903477
] 

ASF GitHub Bot commented on ACCUMULO-2232:
------------------------------------------

Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40147270
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -313,4 +392,48 @@ public static void setColumns(IteratorSetting is, List<IteratorSetting.Column>
c
       public static void setCombineAllColumns(IteratorSetting is, boolean combineAllColumns)
{
         is.addOption(ALL_OPTION, Boolean.toString(combineAllColumns));
       }
    +
    +  /**
    +   * @since 1.6.4 1.7.1 1.8.0
    --- End diff --
    
    Instead of offering the options `IGNORE` and `LOG_ERROR`, we should probably just always
log a warning (if `REDUCE_ON_FULL_COMPACTION_ONLY` is not set) and let users disable it in
the log4j config, rather than in code.
    
    The `REDUCE_ON_FULL_COMPACTION_ONLY` should be treated as a distinct boolean value which
can be set to `true` or `false` as a normal iterator option (similar to `Filter`'s `negate`
option), with a default value of `false` to preserve existing behavior.
    
    That would also minimize API changes, especially for 1.6.x and 1.7.x. While this isn't
technically public API, it'd still be good to minimize API impact on users using those versions.


> Combiners can cause deleted data to come back
> ---------------------------------------------
>
>                 Key: ACCUMULO-2232
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2232
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>            Reporter: John Vines
>
> The case-
> 3 files with-
> * 1 with a key, k, with timestamp 0, value 3
> * 1 with a delete of k with timestamp 1
> * 1 with k with timestamp 2, value 2
> The column of k has a summing combiner set on it. The issue here is that depending on
how the major compactions play out, differing values with result. If all 3 files compact,
the correct value of 2 will result. However, if 1 & 3 compact first, they will aggregate
to 5. And then the delete will fall after the combined value, resulting in the result 5 to
persist.
> First and foremost, this should be documented. I think to remedy this, combiners should
only be used on full MajC, not not full ones. This may necessitate a special flag or a new
combiner that implemented the proper semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message