cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wayne Schroeder (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-6029) Lightweight transactions race render primary key useless
Date Fri, 13 Sep 2013 22:08:52 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wayne Schroeder updated CASSANDRA-6029:
---------------------------------------

    Description: 
When multiple clients/threads do an UPDATE with a failed IF clause, then retry with a good
IF clause, eventually updates with that primary key stop functioning.  This acts like a race
condition and will not reproduce for me unless I have IF clauses that fail with an incorrect
previous value, such as in an update race.  In my specific case, I hard coded my LWT retry
logic's first UPDATE IF attempt to have always an incorrect previous value so that it would
iterate and retry (see attached code.)  The second update (retry) would try with the "returned
current value" and would generally win.  If this pattern was executed under load (jmeter to
a test servlet with a lot of parallel requests), eventually I could not update the row with
the PK I was using.  This was even the case in cqlsh.

The java driver complains about the following which I'm assuming is a red herring:
javax.ejb.EJBException: com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /127.0.0.1 ([/127.0.0.1] Unexpected exception triggered
(org.apache.cassandra.transport.messages.ErrorMessage$WrappedException: org.apache.cassandra.transport.ProtocolException:
Unknown code 8 for a consistency level)))

There is nothing printed to the cassandra console except for this:
 INFO 16:45:50,645 GC for ParNew: 224 ms for 1 collections, 66767104 used; max is 1046937600

And cqlsh ends up behaving like this in my 1 node 1 keyspace 1 replication_factor environment:

cqlsh:formula11> select last_value from id_pools where name='jae';

 last_value
------------
        261

(1 rows)

cqlsh:formula11> update id_pools set last_value=262 where name='jae' if last_value=261;
Request did not complete within rpc_timeout.
cqlsh:formula11> 

It is worth noting that other PKS continue to function in this id_pools table.  Please note
that we only use these "id pools" for low volume required ascending ids and use UUIDs for
other unique ids.



  was:
When multiple clients/threads do an UPDATE with a failed IF clause, then retry with a good
IF clause, eventually updates with that primary key stop functioning.  This acts like a race
condition and will not reproduce for me unless I have IF clauses that fail with an incorrect
previous value, such as in an update race.  In my specific case, I hard coded my LWT retry
logic's first UPDATE IF attempt to have always an incorrect previous value so that it would
iterate and retry (see attached code.)  The second update (retry) would try with the "returned
current value" and would generally win.  If this pattern was executed under load, eventually
I could not update the row with the PK I was using.  This was even the case in cqlsh.

The java driver complains about the following which I'm assuming is a red herring:
javax.ejb.EJBException: com.datastax.driver.core.exceptions.NoHostAvailableException: All
host(s) tried for query failed (tried: /127.0.0.1 ([/127.0.0.1] Unexpected exception triggered
(org.apache.cassandra.transport.messages.ErrorMessage$WrappedException: org.apache.cassandra.transport.ProtocolException:
Unknown code 8 for a consistency level)))

There is nothing printed to the cassandra console except for this:
 INFO 16:45:50,645 GC for ParNew: 224 ms for 1 collections, 66767104 used; max is 1046937600

And cqlsh ends up behaving like this in my 1 node 1 keyspace 1 replication_factor environment:

cqlsh:formula11> select last_value from id_pools where name='jae';

 last_value
------------
        261

(1 rows)

cqlsh:formula11> update id_pools set last_value=262 where name='jae' if last_value=261;
Request did not complete within rpc_timeout.
cqlsh:formula11> 

It is worth noting that other PKS continue to function in this id_pools table.  Please note
that we only use these "id pools" for low volume required ascending ids and use UUIDs for
other unique ids.



    
> Lightweight transactions race render primary key useless
> --------------------------------------------------------
>
>                 Key: CASSANDRA-6029
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6029
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: MacOS 10.8.4, Java 1.7.0_25, JBoss community 7.1.1, Datastax Java
Driver 1.0.3
>            Reporter: Wayne Schroeder
>
> When multiple clients/threads do an UPDATE with a failed IF clause, then retry with a
good IF clause, eventually updates with that primary key stop functioning.  This acts like
a race condition and will not reproduce for me unless I have IF clauses that fail with an
incorrect previous value, such as in an update race.  In my specific case, I hard coded my
LWT retry logic's first UPDATE IF attempt to have always an incorrect previous value so that
it would iterate and retry (see attached code.)  The second update (retry) would try with
the "returned current value" and would generally win.  If this pattern was executed under
load (jmeter to a test servlet with a lot of parallel requests), eventually I could not update
the row with the PK I was using.  This was even the case in cqlsh.
> The java driver complains about the following which I'm assuming is a red herring:
> javax.ejb.EJBException: com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried: /127.0.0.1 ([/127.0.0.1] Unexpected exception triggered
(org.apache.cassandra.transport.messages.ErrorMessage$WrappedException: org.apache.cassandra.transport.ProtocolException:
Unknown code 8 for a consistency level)))
> There is nothing printed to the cassandra console except for this:
>  INFO 16:45:50,645 GC for ParNew: 224 ms for 1 collections, 66767104 used; max is 1046937600
> And cqlsh ends up behaving like this in my 1 node 1 keyspace 1 replication_factor environment:
> cqlsh:formula11> select last_value from id_pools where name='jae';
>  last_value
> ------------
>         261
> (1 rows)
> cqlsh:formula11> update id_pools set last_value=262 where name='jae' if last_value=261;
> Request did not complete within rpc_timeout.
> cqlsh:formula11> 
> It is worth noting that other PKS continue to function in this id_pools table.  Please
note that we only use these "id pools" for low volume required ascending ids and use UUIDs
for other unique ids.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message