cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Soumava Ghosh (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-5830) Paxos loops endlessly due to faulty condition check
Date Tue, 30 Jul 2013 21:25:50 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Soumava Ghosh updated CASSANDRA-5830:
-------------------------------------

    Description: 
Following is the code segment (StorageProxy.java:328) which causes the issue: 

Start is the start time of the paxos, is always less than the current system time, and therefore
the negative difference is always less than the timeout. 

{code:title=StorageProxy.java|borderStyle=solid}
private static UUID beginAndRepairPaxos(long start, ByteBuffer key, CFMetaData metadata, List<InetAddress>
liveEndpoints, int requiredParticipants, ConsistencyLevel consistencyForPaxos)
    throws WriteTimeoutException
    {
        long timeout = TimeUnit.MILLISECONDS.toNanos(DatabaseDescriptor.getCasContentionTimeout());

        PrepareCallback summary = null;
        while (start - System.nanoTime() < timeout)
        {
            long ballotMillis = summary == null
                              ? System.currentTimeMillis()
                              : Math.max(System.currentTimeMillis(), 1 + UUIDGen.unixTimestamp(summary.inProgressCommit.ballot));
            UUID ballot = UUIDGen.getTimeUUID(ballotMillis);
{code}

Here, the paxos gets stuck when PREPARE returns 'true' but with inProgressCommit. The code
in StorageProxy.java:beginAndRepairPaxos() then tries to issue a PREPARE and COMMIT for the
inProgressCommit, and if it repeatedly receives 'false' as a PREPARE_RESPONSE it gets stuck
in an endless loop until PREPARE_RESPONSE is true. 

  was:
Following is the code segment (StorageProxy.java:328) which causes the issue: 

Start is the start time of the paxos, is always less than the current system time, and therefore
the negative difference is always less than the timeout. 

private static UUID beginAndRepairPaxos(long start, ByteBuffer key, CFMetaData metadata, List<InetAddress>
liveEndpoints, int requiredParticipants, ConsistencyLevel consistencyForPaxos)
    throws WriteTimeoutException
    {
        long timeout = TimeUnit.MILLISECONDS.toNanos(DatabaseDescriptor.getCasContentionTimeout());

        PrepareCallback summary = null;
        while (start - System.nanoTime() < timeout)
        {
            long ballotMillis = summary == null
                              ? System.currentTimeMillis()
                              : Math.max(System.currentTimeMillis(), 1 + UUIDGen.unixTimestamp(summary.inProgressCommit.ballot));
            UUID ballot = UUIDGen.getTimeUUID(ballotMillis);


Here, the paxos gets stuck when PREPARE returns 'true' but with inProgressCommit. The code
in StorageProxy.java:beginAndRepairPaxos() then tries to issue a PREPARE and COMMIT for the
inProgressCommit, and if it repeatedly receives 'false' as a PREPARE_RESPONSE it gets stuck
in an endless loop until PREPARE_RESPONSE is true. 

    
> Paxos loops endlessly due to faulty condition check
> ---------------------------------------------------
>
>                 Key: CASSANDRA-5830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5830
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 2.0 beta 2
>            Reporter: Soumava Ghosh
>
> Following is the code segment (StorageProxy.java:328) which causes the issue: 
> Start is the start time of the paxos, is always less than the current system time, and
therefore the negative difference is always less than the timeout. 
> {code:title=StorageProxy.java|borderStyle=solid}
> private static UUID beginAndRepairPaxos(long start, ByteBuffer key, CFMetaData metadata,
List<InetAddress> liveEndpoints, int requiredParticipants, ConsistencyLevel consistencyForPaxos)
>     throws WriteTimeoutException
>     {
>         long timeout = TimeUnit.MILLISECONDS.toNanos(DatabaseDescriptor.getCasContentionTimeout());
>         PrepareCallback summary = null;
>         while (start - System.nanoTime() < timeout)
>         {
>             long ballotMillis = summary == null
>                               ? System.currentTimeMillis()
>                               : Math.max(System.currentTimeMillis(), 1 + UUIDGen.unixTimestamp(summary.inProgressCommit.ballot));
>             UUID ballot = UUIDGen.getTimeUUID(ballotMillis);
> {code}
> Here, the paxos gets stuck when PREPARE returns 'true' but with inProgressCommit. The
code in StorageProxy.java:beginAndRepairPaxos() then tries to issue a PREPARE and COMMIT for
the inProgressCommit, and if it repeatedly receives 'false' as a PREPARE_RESPONSE it gets
stuck in an endless loop until PREPARE_RESPONSE is true. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message