cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10125) ReadFailure is thrown instead of ReadTimeout for range queries
Date Wed, 19 Aug 2015 09:57:45 GMT


Sylvain Lebresne commented on CASSANDRA-10125:

I think the simplest solution is probably to just re-introduce the use of {{Verb.RANGE_SLICE}}
for range queries so we get the proper timeout. Pushed a patch [here|]
to do so. It adds a small amount of cruft but that will mostly go away once we drop backward
compatibility with pre-3.0 (and it's not a big deal in the first place). I'll wait on CI to
finish to make sure that patch doesn't break anything before calling this ready for review.

> ReadFailure is thrown instead of ReadTimeout for range queries
> --------------------------------------------------------------
>                 Key: CASSANDRA-10125
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0 beta 2
> CASSANDRA-8099 merged the way single partition and range read messages where handled
and has switch to using the same verb ({{Verb.READ}}) for both, effectively deprecating {{Verb.RANGE_SLICE}}.
Unfortunately, we are relying on having 2 different verbs for timeouts. More precisely, when
adding a callback in the expiring map of {{MessagingService}}, we use the timeout from the
{{Verb}}. As a consequence, it's currently set with the single partition read timeout (5s)
even for range queries (which have a 10s timeout).  And when a callback expires, it is notified
as a failure to the callback (which is debatable imo but a separate issue), which means range
queries will generally send a ReadFailure (after 5s) instead of a ReadTimeout (since they
do wait 10s before sending those).
> That is the reason for at least the failure of {{nosetests replace_address_test:TestReplaceAddress.replace_first_boot_test}}
dtest (the test has 3 nodes, kill one and expects a timeout at CL.THREE but get a failure

This message was sent by Atlassian JIRA

View raw message