accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2232) Combiners can cause deleted data to come back
Date Fri, 25 Sep 2015 15:36:04 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908167#comment-14908167
] 

ASF GitHub Bot commented on ACCUMULO-2232:
------------------------------------------

Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/accumulo/pull/47#discussion_r40442327
  
    --- Diff: core/src/main/java/org/apache/accumulo/core/iterators/Combiner.java ---
    @@ -313,4 +392,48 @@ public static void setColumns(IteratorSetting is, List<IteratorSetting.Column>
c
       public static void setCombineAllColumns(IteratorSetting is, boolean combineAllColumns)
{
         is.addOption(ALL_OPTION, Boolean.toString(combineAllColumns));
       }
    +
    +  /**
    +   * @since 1.6.4 1.7.1 1.8.0
    +   */
    +  public static enum DeleteHandlingAction {
    +    /**
    +     * Do nothing when a a delete is observed by a combiner during a major compaction.
    +     */
    +    IGNORE,
    +
    +    /**
    +     * Log an error when a a delete is observed by a combiner during a major compaction.
An error is not logged for each delete entry seen. Once a
    +     * combiner has seen a delete during a major compaction and logged an error, it will
not do so again for at least an hour.
    +     */
    +    LOG_ERROR,
    +
    +    /**
    +     * Pass all data through during partial major compactions, no reducing is done. With
this option reducing is only done during scan and full major
    +     * compactions, when deletes can be correctly handled.
    +     */
    +    REDUCE_ON_FULL_COMPACTION_ONLY
    +  }
    +
    +  /**
    +   * Combiners may not work correctly with deletes. Sometimes when Accumulo compacts
the files in a tablet, it only compacts a subset of the files. If a delete
    +   * marker exists in one of the files that is not being compacted, then data that should
be deleted may be combined. See
    +   * <a href="https://issues.apache.org/jira/browse/ACCUMULO-2232">ACCUMULO-2232</a>
for more information.
    +   *
    +   * <p>
    +   * This method allows users to configure how they want to handle the combination of
delete markers, combiners, and major compactions. The default behavior is
    +   * {@link DeleteHandlingAction#LOG_ERROR}. See the javadoc on each {@link DeleteHandlingAction}
enum for a description of each option.
    +   *
    +   * <p>
    +   * For correctness deletes should not be used with columns that are combined OR the
{@link DeleteHandlingAction#REDUCE_ON_FULL_COMPACTION_ONLY} option should
    +   * be used. Only reducing on full major compactions may have negative performance implications.
    +   *
    --- End diff --
    
    I don't think this should be coupled with this fix.  Also not sure about doing this in
1.6.  I opened a new issue [ACCUMULO-4011](https://issues.apache.org/jira/browse/ACCUMULO-4011)


> Combiners can cause deleted data to come back
> ---------------------------------------------
>
>                 Key: ACCUMULO-2232
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2232
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, tserver
>            Reporter: John Vines
>            Assignee: Keith Turner
>             Fix For: 1.6.4, 1.7.1, 1.8.0
>
>
> The case-
> 3 files with-
> * 1 with a key, k, with timestamp 0, value 3
> * 1 with a delete of k with timestamp 1
> * 1 with k with timestamp 2, value 2
> The column of k has a summing combiner set on it. The issue here is that depending on
how the major compactions play out, differing values with result. If all 3 files compact,
the correct value of 2 will result. However, if 1 & 3 compact first, they will aggregate
to 5. And then the delete will fall after the combined value, resulting in the result 5 to
persist.
> First and foremost, this should be documented. I think to remedy this, combiners should
only be used on full MajC, not not full ones. This may necessitate a special flag or a new
combiner that implemented the proper semantics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message