cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-5925) Race condition in update lightweight transaction
Date Mon, 26 Aug 2013 09:36:53 GMT


Sylvain Lebresne updated CASSANDRA-5925:

    Attachment: 0001-Write-ballot-with-proposal.txt

Thanks for the full stack.

Not completely sure what triggers that NPE however. I do see one scenario where the paxos
state on-disk could not be empty but the there's no "in_progress_ballot" column (hence triggering
that NPE): since savePaxosPromise only write the "proposal" and not the "in_progress_ballot",
and since we set TTL on inserts, it sounds possible to end up in a case where a paxos row
only has the "proposal" column but everything else has expired. I do am attaching a simple
patch to always write "in_progress_ballot" to avoid that, but given that paxos TTL is at least
3 hours, I doubt that's the scenario you are running into in your test. So not sure what's
going on ([~jbellis], if you have a brilliant idea...).

Phil, if you could check if the attached patch fixes it by any chance, that could be helpful.
If it doesn't (likely), would you be able to provide a simple test script that reproduce this?

> Race condition in update lightweight transaction
> ------------------------------------------------
>                 Key: CASSANDRA-5925
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: 3 node Cassandra 2.0.0-rc2 cluster. Java driver 1.0.2.
>            Reporter: Phil Persad
>         Attachments: 0001-Write-ballot-with-proposal.txt
> I'm building some tests for a Cassandra PoC.  One scenario I need to test is consumption
of 1 time tokens.  These tokens must be consumed exactly once.  The cluster involved is a
3 node cluster.  All queries are run with ConsistencyLevel.QUORUM. I'm using the following
> CREATE KEYSPACE IF NOT EXISTS test WITH replication = { 'class' : 'SimpleStrategy', 'replication_factor'
: 3 };
> CREATE TABLE IF NOT EXISTS tkns (tkn blob, consumed boolean, PRIMARY KEY (tkn));
> INSERT INTO tkns (tkn, consumed) VALUES (?,FALSE) USING TTL 30;
> UPDATE tkns USING TTL 1 SET consumed = TRUE WHERE tkn = ? IF consumed = FALSE;
> I use the '[applied]' column in the result set of the update statement to determine whether
the token has been successfully consumed or if the token is being replayed.
> My test involves concurrently executing many sets of 1 insert and 2 update statements
(using Session#execute on BoundStatemnts) then checking to make sure that only one of the
updates was applied.
> When I run this test with relatively few iterations (~100) my results are  what I expect
(exactly 1 update succeeds).  At ~1000 iterations, I start seeing both updates reporting success
in 1-2% of cases.  While my test is running, I see corresponding error entries in the Cassandra
> ERROR 15:34:53,583 Exception in thread Thread[MutationStage:522,5,main]
> java.lang.NullPointerException
> ERROR 15:34:53,584 Exception in thread Thread[MutationStage:474,5,main]
> java.lang.NullPointerException
> ERROR 15:34:53,584 Exception in thread Thread[MutationStage:536,5,main]
> java.lang.NullPointerException
> ERROR 15:34:53,729 Exception in thread Thread[MutationStage:480,5,main]
> java.lang.NullPointerException
> ERROR 15:34:53,729 Exception in thread Thread[MutationStage:534,5,main]
> java.lang.NullPointerException
> Thanks.
> Update:
> I'm not sure what's going on with the logging the the dev release.  I grabbed the rc2
source and built that.  The resultant log is a bit more informative:
> ERROR 11:53:38,967 Exception in thread Thread[MutationStage:114,5,main]
> java.lang.NullPointerException
> 	at org.apache.cassandra.serializers.UUIDSerializer.deserialize(
> 	at org.apache.cassandra.serializers.UUIDSerializer.deserialize(
> 	at org.apache.cassandra.db.marshal.AbstractType.compose(
> 	at org.apache.cassandra.cql3.UntypedResultSet$Row.getUUID(
> 	at org.apache.cassandra.db.SystemKeyspace.loadPaxosState(
> 	at org.apache.cassandra.service.paxos.PaxosState.commit(
> 	at org.apache.cassandra.service.paxos.CommitVerbHandler.doVerb(
> 	at
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(
> 	at java.util.concurrent.ThreadPoolExecutor$
> 	at

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message