cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-7886) Coordinator should not wait for read timeouts when replicas hit Exceptions
Date Tue, 16 Dec 2014 22:14:13 GMT


Tyler Hobbs commented on CASSANDRA-7886:

bq. Hi Tyler Hobbs, sorry I kept you waiting for so long.

No worries, I know you're busy :)

bq. The commented code was meant as a preparation for WriteFailureExceptions. Does it perhaps
make sense to fully add WriteFailureException? As a follow up ticket, we could implement it
then for the different writes. Or do you want me to get rid it?

I do think it's a good idea to implement something similar for writes, and splitting that
into a second ticket would be good.  So go ahead and delete the comments for this patch.

Just to make sure that we dont touch anything new here: TOEs are logged inside SliceQueryFilter.collectReducedColumns
already. I simply took this catch block from the ReadVerbHandler/RangeSliceVerbHandler and
put into StorageProxy/MessageDeliveryTask.
I don't like that either, but I did not want to touch it. Do you still want me to change it?

Yes, go ahead and remove those other try/catch blocks as well.  I can't see a reason why they
should be suppressed once the logging statement is removed.

bq. I merged ReadTimeoutException|ReadFailureException into a single catch block.

Cool.  The way you did it there looks perfect.  Further up in StorageProxy there's an almost
identical chunk of code.  Can you condense that one as well?

bq. I also added the last cell-name to the TOE, so that an administrator can get an estimate
where to look for the tombstones. This doesn't really match the tickets new name, but is related
to my original issue 

The many implementations of CellName don't implement {{toString()}}, so I think you want {{container.getComparator().getString(}}

> Coordinator should not wait for read timeouts when replicas hit Exceptions
> --------------------------------------------------------------------------
>                 Key: CASSANDRA-7886
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Tested with Cassandra 2.0.8
>            Reporter: Christian Spriegel
>            Assignee: Christian Spriegel
>            Priority: Minor
>              Labels: protocolv4
>             Fix For: 3.0
>         Attachments: 7886_v1.txt, 7886_v2_trunk.txt, 7886_v3_trunk.txt, 7886_v4_trunk.txt
> *Issue*
> When you have TombstoneOverwhelmingExceptions occuring in queries, this will cause the
query to be simply dropped on every data-node, but no response is sent back to the coordinator.
Instead the coordinator waits for the specified read_request_timeout_in_ms.
> On the application side this can cause memory issues, since the application is waiting
for the timeout interval for every request.Therefore, if our application runs into TombstoneOverwhelmingExceptions,
then (sooner or later) our entire application cluster goes down :-(
> *Proposed solution*
> I think the data nodes should send a error message to the coordinator when they run into
a TombstoneOverwhelmingException. Then the coordinator does not have to wait for the timeout-interval.

This message was sent by Atlassian JIRA

View raw message