Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2930298EE for ; Tue, 14 Feb 2012 17:08:21 +0000 (UTC) Received: (qmail 65827 invoked by uid 500); 14 Feb 2012 17:08:21 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 65801 invoked by uid 500); 14 Feb 2012 17:08:21 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 65792 invoked by uid 99); 14 Feb 2012 17:08:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2012 17:08:20 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Feb 2012 17:08:19 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id BEEA91B76F9 for ; Tue, 14 Feb 2012 17:07:59 +0000 (UTC) Date: Tue, 14 Feb 2012 17:07:59 +0000 (UTC) From: "Jeremy Hanna (Commented) (JIRA)" To: commits@cassandra.apache.org Message-ID: <743264992.37049.1329239279783.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1284193528.7072.1328269313945.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207832#comment-13207832 ] Jeremy Hanna commented on CASSANDRA-3843: ----------------------------------------- I did patch with v2. Doing more testing today and it appears that there are writes occurring but it looks like a definite reduction. It could be a valid repair thing. I'll do some more testing and hopefully repair every node and compact every node and then do a scan across a large column family and see what happens. > Unnecessary ReadRepair request during RangeScan > ------------------------------------------------ > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.0 > Reporter: Philip Andronov > Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, unfortunately I have no enought knowledge of Cassandra internals to fix the problem and do not broke CASSANDRA-2494 functionality, so my report without a patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // .... > private class Reducer extends MergeIterator.Reducer, Row> > { > // .... > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given source > versions.add(null); > versionSources.add(source); > } > } > } > // .... > if (resolved != null) > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, versions, versionSources)); > // .... > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // .... > public static List scheduleRepairs(ColumnFamily resolved, String table, DecoratedKey key, List versions, List endpoints) > { > List results = new ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which are obviously > // not equals, so it will fire a ReadRequest, however it is not needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), resolved); > if (diffCf == null) > continue; > // .... > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There is no need to do that, the next consistent read have a chance to be served by nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in time we are getting TimeOutException because cluster is overloaded by the ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira