accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2232) Combiners can cause deleted data to come back
Date Tue, 22 Sep 2015 04:37:04 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901921#comment-14901921
] 

ASF GitHub Bot commented on ACCUMULO-2232:
------------------------------------------

Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40052540
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -313,4 +378,46 @@ public static void setColumns(IteratorSetting is, List<IteratorSetting.Column>
c
       public static void setCombineAllColumns(IteratorSetting is, boolean combineAllColumns)
{
         is.addOption(ALL_OPTION, Boolean.toString(combineAllColumns));
       }
    +
    +  public static enum DeleteHandlingAction {
    +    /**
    +     * Do nothing when a a delete is observed by a combiner during a partial major compaction.
    +     */
    +    IGNORE,
    +
    +    /**
    +     * Log an error when a a delete is observed by a combiner during a partial major
compaction. An error is not logged for each delete entry seen. Once a
    +     * combiner has seen a delete during a partial compaction and logged an error, it
will not do so again for at least an hour.
    +     */
    +    LOG_ERROR,
    +
    +    /**
    +     * Throw an exception when a a delete is observed by a combiner during a partial
major compaction.
    +     */
    +    THROW_EXCEPTION,
    +
    +    /**
    +     * Pass all data through during partial major compactions, no reducing is done. With
this option reducing is only done during scan and full major
    +     * compactions, when deletes can be correctly handled.
    +     */
    +    REDUCE_ON_FULL_COMPACTION_ONLY
    +  }
    +
    +  /**
    +   * Combiners may not work correctly with deletes. Sometimes when Accumulo compacts
the files in a tablet, it only compacts a subset of the files. If a delete
    --- End diff --
    
    Great writeup.


> Combiners can cause deleted data to come back
> ---------------------------------------------
>
>                 Key: ACCUMULO-2232
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2232
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>            Reporter: John Vines
>
> The case-
> 3 files with-
> * 1 with a key, k, with timestamp 0, value 3
> * 1 with a delete of k with timestamp 1
> * 1 with k with timestamp 2, value 2
> The column of k has a summing combiner set on it. The issue here is that depending on
how the major compactions play out, differing values with result. If all 3 files compact,
the correct value of 2 will result. However, if 1 & 3 compact first, they will aggregate
to 5. And then the delete will fall after the combined value, resulting in the result 5 to
persist.
> First and foremost, this should be documented. I think to remedy this, combiners should
only be used on full MajC, not not full ones. This may necessitate a special flag or a new
combiner that implemented the proper semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message