cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Whiteside (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
Date Fri, 06 Nov 2015 03:29:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992789#comment-14992789
] 

Aaron Whiteside edited comment on CASSANDRA-9328 at 11/6/15 3:28 AM:
---------------------------------------------------------------------

Using a version id (to execute the conditional update on) and a transaction id (to determine
if a WTE really succeeded, representing the current thread/transaction/operation) still does
not work.

Thread A: reads version 1
Thread A: updates version 1 to 2, transaction id to ABC, and sets account balance to $0+$100=$100,
successfully applies the update but still receives a WTE.
Thread B: reads version 2
Thread B: updates version 2 to 3, transaction id to XYZ, and sets account balance to $100+500=$600,
win the race, no WTEs anywhere in sight.
Thread B: is happy!
Thread A: tries again, reads version 3 this time, sees that version 3 is greater than it's
previous version 2, now it checks the transaction id and finds it's also different.. 

How can thread A know that it's update failed or succeeded? since between it doing the update
and reading the record again, someone else has updated it.

At this point thread A might assume it failed and try again and add another $100 to the balance,
causing more money to appear in the account than would be expected. Or it might choose to
abandon the transaction, but if the WTE was actually due to a timeout and not contention the
balance will have $100 less then is expected.

And no one is happy.


was (Author: aaronjwhiteside):
Using a version id (to execute the conditional update on) and a transaction id (to determine
if a WTE really succeeded, representing the current thread/transaction/operation) still does
not work.

Thread A: reads version 1
Thread A: updates version 1 to 2, transaction id to ABC, and sets account balance to $0+$100=$100,
but receives a WTE.
Thread B: reads version 2
Thread B: updates version 2 to 3, transaction id to XYZ, and sets account balance to $100+500=$600,
win the race, no WTEs anywhere in sight.
Thread B: is happy!
Thread A: tries again, reads version 3 this time, sees that version 3 is greater than it's
previous version 2, now it checks the transaction id and finds it's also different.. 

How can thread A know that it's update failed or succeeded? since between it doing the update
and reading the record again, someone else has updated it.

At this point thread A might assume it failed and try again and add another $100 to the balance,
causing more money to appear in the account than would be expected. Or it might choose to
abandon the transaction, but if the WTE was actually due to a timeout and not contention the
balance will have $100 less then is expected.

And no one is happy.

> WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration
taking MUCH less than cas_contention_timeout_in_ms
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9328
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Aaron Whiteside
>             Fix For: 2.1.x
>
>         Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration
taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If the WTE
is due to not being able to communicate with other nodes, why does the concurrency >1 cause
inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message