Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F3DDD17C85 for ; Tue, 1 Sep 2015 16:57:46 +0000 (UTC) Received: (qmail 79841 invoked by uid 500); 1 Sep 2015 16:57:46 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 79807 invoked by uid 500); 1 Sep 2015 16:57:46 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 79786 invoked by uid 99); 1 Sep 2015 16:57:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Sep 2015 16:57:46 +0000 Date: Tue, 1 Sep 2015 16:57:46 +0000 (UTC) From: "Keith Turner (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (ACCUMULO-2232) Combiners can cause deleted data to come back MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725686#comment-14725686 ] Keith Turner edited comment on ACCUMULO-2232 at 9/1/15 4:57 PM: ---------------------------------------------------------------- bq. the performance implications will be huge and this is enough rope for people to hang themselves with. However, I think a lot of people use combiners on tables that are append-only and never delete Thinking about this case where people want to use combiners and do not delete, there is the exception option to consider. Make combiners throw an exception if a delete marker is seen during a partial major compaction. However this only makes a user aware of the problem, it does not prevent the problem. The reason this approach does not prevent the problems is that by the time a delete marker is seen, data that was supposed to have been deleted could have already been combined by a previous partial compaction that did not see any delete markers. was (Author: kturner): bq. the performance implications will be huge and this is enough rope for people to hang themselves with. However, I think a lot of people use combiners on tables that are append-only and never delete Thinking about this case where people want to use combiners and do not delete, there is the exception option to consider. Make combiners throw an exception if a delete marker is seen during a partial major compaction. However this only makes a user aware of the problem, it does not prevent the problem. The reason this approach does not prevent the problems is that by the time a delete marker is seen, data that was supposed to have been deleted could have already been combined by a previous partial compaction that did not see any delete markers. > Combiners can cause deleted data to come back > --------------------------------------------- > > Key: ACCUMULO-2232 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2232 > Project: Accumulo > Issue Type: Bug > Components: client, tserver > Reporter: John Vines > > The case- > 3 files with- > * 1 with a key, k, with timestamp 0, value 3 > * 1 with a delete of k with timestamp 1 > * 1 with k with timestamp 2, value 2 > The column of k has a summing combiner set on it. The issue here is that depending on how the major compactions play out, differing values with result. If all 3 files compact, the correct value of 2 will result. However, if 1 & 3 compact first, they will aggregate to 5. And then the delete will fall after the combined value, resulting in the result 5 to persist. > First and foremost, this should be documented. I think to remedy this, combiners should only be used on full MajC, not not full ones. This may necessitate a special flag or a new combiner that implemented the proper semantics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)