accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2232) Combiners can cause deleted data to come back
Date Tue, 22 Sep 2015 21:39:07 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903493#comment-14903493
] 

ASF GitHub Bot commented on ACCUMULO-2232:
------------------------------------------

Github user ctubbsii commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40147937
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -313,4 +392,48 @@ public static void setColumns(IteratorSetting is, List<IteratorSetting.Column>
c
       public static void setCombineAllColumns(IteratorSetting is, boolean combineAllColumns)
{
         is.addOption(ALL_OPTION, Boolean.toString(combineAllColumns));
       }
    +
    +  /**
    +   * @since 1.6.4 1.7.1 1.8.0
    +   */
    +  public static enum DeleteHandlingAction {
    +    /**
    +     * Do nothing when a a delete is observed by a combiner during a major compaction.
    +     */
    +    IGNORE,
    +
    +    /**
    +     * Log an error when a a delete is observed by a combiner during a major compaction.
An error is not logged for each delete entry seen. Once a
    +     * combiner has seen a delete during a major compaction and logged an error, it will
not do so again for at least an hour.
    +     */
    +    LOG_ERROR,
    +
    +    /**
    +     * Pass all data through during partial major compactions, no reducing is done. With
this option reducing is only done during scan and full major
    +     * compactions, when deletes can be correctly handled.
    +     */
    +    REDUCE_ON_FULL_COMPACTION_ONLY
    +  }
    +
    +  /**
    +   * Combiners may not work correctly with deletes. Sometimes when Accumulo compacts
the files in a tablet, it only compacts a subset of the files. If a delete
    +   * marker exists in one of the files that is not being compacted, then data that should
be deleted may be combined. See
    +   * <a href="https://issues.apache.org/jira/browse/ACCUMULO-2232">ACCUMULO-2232</a>
for more information.
    +   *
    +   * <p>
    +   * This method allows users to configure how they want to handle the combination of
delete markers, combiners, and major compactions. The default behavior is
    +   * {@link DeleteHandlingAction#LOG_ERROR}. See the javadoc on each {@link DeleteHandlingAction}
enum for a description of each option.
    +   *
    +   * <p>
    +   * For correctness deletes should not be used with columns that are combined OR the
{@link DeleteHandlingAction#REDUCE_ON_FULL_COMPACTION_ONLY} option should
    +   * be used. Only reducing on full major compactions may have negative performance implications.
    +   *
    --- End diff --
    
    Another option to give users the ability to "delete" without using problematic deletes
is to give them the ability to return a `null` value, with the semantics of `null` being "drop
them all". This, combined with the ability to detect which iterator scope and and major compaction
mode is being run, could be used to drop data in a sensible way, suitable to the user's schema,
without inserting problematic delete markers.
    
    If we did that, it'd be good to add that here, with the recommendation to avoid using
deletes entirely.


> Combiners can cause deleted data to come back
> ---------------------------------------------
>
>                 Key: ACCUMULO-2232
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2232
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>            Reporter: John Vines
>
> The case-
> 3 files with-
> * 1 with a key, k, with timestamp 0, value 3
> * 1 with a delete of k with timestamp 1
> * 1 with k with timestamp 2, value 2
> The column of k has a summing combiner set on it. The issue here is that depending on
how the major compactions play out, differing values with result. If all 3 files compact,
the correct value of 2 will result. However, if 1 & 3 compact first, they will aggregate
to 5. And then the delete will fall after the combined value, resulting in the result 5 to
persist.
> First and foremost, this should be documented. I think to remedy this, combiners should
only be used on full MajC, not not full ones. This may necessitate a special flag or a new
combiner that implemented the proper semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message