cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7886) TombstoneOverwhelmingException should not wait for timeout
Date Tue, 21 Oct 2014 10:55:35 GMT


Sylvain Lebresne commented on CASSANDRA-7886:

bq. I assume you worry about clients not being able to handle the new code


bq. In my opinion any client-code that does not have a default-case should be punished. So
I would not hestitate to add it

Allow me to disagree. Even if drivers have a default case, they will still not know what that
new exception code is about, so they will likely throw some generic "ShouldNotHappen" exception,
which almost surely the client hasn't taken into account (or at not in the same way they've
taken a timeout exception into account, which is what is thrown currently). There's a reason
we version the protocol and it's so that clients can have the assurance that we don't change
anything from under them. If we fail that, we should be the ones that should be punished.

bq. I assume with CQL 4 (CASSANDRA-8043) a clean code handling and additional fields for be
implemented for read_failures?

Yes, and I'm saying that such handling should be part of the patch (but please don't call
it "CQL 4" or you'll confuse everyone: it's just version 4 of the binary protocol, not of
the language).

> TombstoneOverwhelmingException should not wait for timeout
> ----------------------------------------------------------
>                 Key: CASSANDRA-7886
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>             Fix For: 3.0
>         Attachments: 7886_v1.txt
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will cause the
query to be simply dropped on every data-node, but no response is sent back to the coordinator.
Instead the coordinator waits for the specified read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application is waiting
for the timeout interval for every request.Therefore, if our application runs into TombstoneOverwhelmingExceptions,
then (sooner or later) our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when they run into
a TombstoneOverwhelmingException. Then the coordinator does not have to wait for the timeout-interval.

This message was sent by Atlassian JIRA

View raw message