Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFF2318824 for ; Thu, 11 Feb 2016 16:24:18 +0000 (UTC) Received: (qmail 30491 invoked by uid 500); 11 Feb 2016 16:24:18 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 30459 invoked by uid 500); 11 Feb 2016 16:24:18 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 30448 invoked by uid 99); 11 Feb 2016 16:24:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Feb 2016 16:24:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 776D12C1F6F for ; Thu, 11 Feb 2016 16:24:18 +0000 (UTC) Date: Thu, 11 Feb 2016 16:24:18 +0000 (UTC) From: "Jim Witschey (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-10342) Read defragmentation can cause unnecessary repairs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142973#comment-15142973 ] Jim Witschey commented on CASSANDRA-10342: ------------------------------------------ [~slebresne] Sorry, I pinged someone privately and should have mentioned it here. We're focused on dtest work and won't have time to do a good benchmark here for a while, so I think the thing to do is either what you proposed, or find a dev to benchmark it. > Read defragmentation can cause unnecessary repairs > -------------------------------------------------- > > Key: CASSANDRA-10342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10342 > Project: Cassandra > Issue Type: Bug > Reporter: Marcus Olsson > Assignee: Marcus Eriksson > Priority: Minor > > After applying the fix from CASSANDRA-10299 to the cluster we started having a problem of ~20k small sstables appearing for the table with static data when running incremental repair. > In the logs there were several messages about flushes for that table, one for each repaired range. The flushed sstables were 0.000kb in size with < 100 ops in each. When checking cfstats there were several writes to that table, even though we were only reading from it and read repair did not repair anything. > After digging around in the codebase I noticed that defragmentation of data can occur while reading, depending on the query and some other conditions. This causes the read data to be inserted again to have it in a more recent sstable, which can be a problem if that data was repaired using incremental repair. The defragmentation is done in [CollationController.java|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/db/CollationController.java#L151]. > I guess this wasn't a problem with full repairs since I assume that the digest should be the same even if you have two copies of the same data. But with incremental repair this will most probably cause a mismatch between nodes if that data already was repaired, since the other nodes probably won't have that data in their unrepaired set. > ------ > I can add that the problems on our cluster was probably due to the fact that CASSANDRA-10299 caused the same data to be streamed multiple times and ending up in several sstables. One of the conditions for the defragmentation is that the number of sstables read during a read request have to be more than the minimum number of sstables needed for a compaction(> 4 in our case). So normally I don't think this would cause ~20k sstables to appear, we probably hit an extreme. > One workaround for this is to use another compaction strategy than STCS(it seems to be the only affected strategy, atleast in 2.1), but the solution might be to either make defragmentation configurable per table or avoid reinserting the data if any of the sstables involved in the read are repaired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)