Return-Path: Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: (qmail 77468 invoked from network); 28 Feb 2011 23:52:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Feb 2011 23:52:01 -0000 Received: (qmail 46083 invoked by uid 500); 28 Feb 2011 23:52:01 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 46053 invoked by uid 500); 28 Feb 2011 23:52:00 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 46045 invoked by uid 99); 28 Feb 2011 23:52:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Feb 2011 23:52:00 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Feb 2011 23:51:58 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id EDD2B42BFE for ; Mon, 28 Feb 2011 23:51:36 +0000 (UTC) Date: Mon, 28 Feb 2011 23:51:36 +0000 (UTC) From: "Jonathan Ellis (JIRA)" To: commits@cassandra.apache.org Message-ID: <621603318.3456.1298937096970.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <853662540.109.1298822679766.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Updated: (CASSANDRA-2253) Gossiper Starvation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2253: -------------------------------------- Affects Version/s: (was: 0.7.2) (was: 0.7.1) Fix Version/s: (was: 0.7.0) 0.7.4 > Gossiper Starvation > ------------------- > > Key: CASSANDRA-2253 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2253 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.7.0 > Environment: linux, windows > Reporter: Mikael Sitruk > Fix For: 0.7.4 > > Attachments: CASSANDRA-0.7-2253.txt > > Original Estimate: 2h > Remaining Estimate: 2h > > Gossiper periodic task will get into starvation in case large sstable files need to be deleted. > Indeed the SSTableDeletingReference uses the same scheduledTasks pool (from StorageService) as the Gossiper and other periodic tasks, but the gossiper tasks should run each second to assure correct cluster status (liveness of nodes). In case of large sstable files to be deleted (several GB) the delete operation can take more than 30 sec, thus making the whole cluster going into a wrong state where nodes are marked as not living while they are! > This will lead to unneeded additional load like hinted hand off, wrong cluster state, increase in latency. > One of the possible solution is to use a separate pool for periodic and non periodic tasks. > I've implemented such change and it resolves the problem. > I can provide a patch -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira