cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Spriegel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7886) TombstoneOverwhelmingException should not wait for timeout
Date Mon, 22 Sep 2014 14:27:35 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14143247#comment-14143247
] 

Christian Spriegel commented on CASSANDRA-7886:
-----------------------------------------------

[~kohlisankalp]: Thanks for you feedback.

[~slebresne], [~kohlisankalp]: I attached a patch for C 2.1 where I implemented remote failure
handling for reads and range-reads.

Using a ccm 3 node cluster, I tested remote and local read failures. Both CLI and CQLSH return
instantly, instead of waiting for timeouts.

Any feedback? Could this be merged into 2.1? Please let me know if the patch needs improvement.

I guess, the next steps would be to implement callbacks for writes, truncates, etc.

> TombstoneOverwhelmingException should not wait for timeout
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-7886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7886
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: 7886_v1.txt
>
>
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will cause the
query to be simply dropped on every data-node, but no response is sent back to the coordinator.
Instead the coordinator waits for the specified read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application is waiting
for the timeout interval for every request.Therefore, if our application runs into TombstoneOverwhelmingExceptions,
then (sooner or later) our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when they run into
a TombstoneOverwhelmingException. Then the coordinator does not have to wait for the timeout-interval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message