Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Tue, 21 Oct 2014 11:30:34 +0000 (UTC)
From: "Christian Spriegel (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12739443.1409910140000.304871.1413891034471@Atlassian.JIRA>
In-Reply-To: <JIRA.12739443.1409910140000@Atlassian.JIRA>
References: <JIRA.12739443.1409910140000@Atlassian.JIRA>
 <JIRA.12739443.1409910140677@arcas>
Subject: [jira] [Comment Edited] (CASSANDRA-7886)
 TombstoneOverwhelmingException should not wait for timeout
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178282#comment-14178282 ] 

Christian Spriegel edited comment on CASSANDRA-7886 at 10/21/14 11:30 AM:
--------------------------------------------------------------------------

[~slebresne]:  Does it make sense that I prepare a patch on trunk that includes the errror-handling? Also I would do some (manual) testing on trunk.

edit: My initial patch implemented fast-fail only for reads. Should I perhaps try to also implement it for other operations?


was (Author: christianmovi):
[~slebresne]:  Does it make sense that I prepare a patch on trunk that includes the errror-handling? Also I would do some (manual) testing on trunk.


> TombstoneOverwhelmingException should not wait for timeout
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: 7886_v1.txt
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will cause the query to be simply dropped on every data-node, but no response is sent back to the coordinator. Instead the coordinator waits for the specified read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application is waiting for the timeout interval for every request.Therefore, if our application runs into TombstoneOverwhelmingExceptions, then (sooner or later) our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when they run into a TombstoneOverwhelmingException. Then the coordinator does not have to wait for the timeout-interval.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)