incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soumava Ghosh <soum...@cs.utexas.edu>
Subject Re: Cassandra 2.0: Paxos Prepare response always false
Date Tue, 30 Jul 2013 20:54:59 GMT
JIRA filed at https://issues.apache.org/jira/browse/CASSANDRA-5830

Thanks,
Soumava


On Mon, Jul 29, 2013 at 5:56 PM, Soumava Ghosh <soumava@cs.utexas.edu>wrote:

> Hi aaron,
>
> I think I know the reason for this bug. Following is the code segment
> (StorageProxy.java:328) which causes the issue:
>
> Start is the start time of the paxos, is always less than the current
> system time, and therefore the negative difference is always less than the
> timeout.
>
> [image: Inline image 1]
>
>
> Here, the paxos gets stuck when PREPARE returns 'true' but with
> inProgressCommit. The code in StorageProxy.java:beginAndRepairPaxos() then
> tries to issue a PREPARE and COMMIT for the inProgressCommit, and if it
> repeatedly receives 'false' as a PREPARE_RESPONSE it gets stuck in an
> endless loop until PREPARE_RESPONSE is true.
>
> Thanks,
> Soumava
>
>
> On Mon, Jul 29, 2013 at 1:00 AM, aaron morton <aaron@thelastpickle.com>wrote:
>
>> Thanks for looking into this.
>>
>> If you have a way to reproduce this the best thing to do is create a
>> ticket at https://issues.apache.org/jira/browse/CASSANDRA as 2.0 is
>> still under development.
>>
>> Cheers
>>
>>
>>    -----------------
>> Aaron Morton
>> Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 26/07/2013, at 11:29 AM, Soumava Ghosh <soumava@cs.utexas.edu> wrote:
>>
>> Hi,
>>
>> I have test setup where clients randomly make a controlled number of
>> cas() requests (among other requests) at a cluster of cassandra 2.0
>> servers. After one point, I'm seeing that all requests are pending and my
>> client's throughput has reduced to 0.0 for all kinds of requests. For this
>> specific case I had 10 clients each making around 30 cas() requests per
>> second at a cluster of 72 instances of cassandra.
>>
>> Clients are set up to register a request as a success after the cas()
>> call returns with CASResult.success = true, else an exception is thrown.
>> Since I see that no client requests were actually registered and no
>> exceptions were thrown, which indicates that the cas() call itself is hung.
>>
>> On the server side, I see Paxos logs as follows - they go on for 50 log
>> files for each of the servers involved, and they span at least an hour. I
>> have marked a particular instance where the prepare response is true but
>> the propose response is false from all the involved servers:
>>
>> *At the Paxos Initiator: * None of the files among the 50 system logs
>> have the phrase 'Propose response true', these logs just go on and on.
>> *
>> *
>> DEBUG [RequestResponseStage:110] 2013-07-25 15:09:05,332
>> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d2093120-f576-11e2-a57e-a154d605509d, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
>> 17.163.7.195
>> *
>> *
>> DEBUG [RequestResponseStage:92] 2013-07-25 15:09:05,346
>> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d2093120-f576-11e2-a57e-a154d605509d, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
>> 17.163.7.184
>>
>> DEBUG [RequestResponseStage:98] 2013-07-25 15:09:05,347
>> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d2093120-f576-11e2-a57e-a154d605509d, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
>> 17.163.7.20
>>
>> DEBUG [RequestResponseStage:93] 2013-07-25 15:09:05,350
>> ProposeCallback.java (line 44) Propose response false from /17.163.7.20
>> DEBUG [RequestResponseStage:100] 2013-07-25 15:09:05,350
>> ProposeCallback.java (line 44) Propose response false from /17.163.7.184
>> DEBUG [RequestResponseStage:111] 2013-07-25 15:09:05,350
>> ProposeCallback.java (line 44) Propose response false from /17.163.7.195
>>
>> DEBUG [RequestResponseStage:102] 2013-07-25 15:09:05,351
>> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
>> 17.163.7.195
>>
>> DEBUG [RequestResponseStage:107] 2013-07-25 15:09:05,352
>> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
>> 17.163.7.20
>>
>> DEBUG [RequestResponseStage:108] 2013-07-25 15:09:05,352
>> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
>> Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
>> 17.163.7.184
>>
>> DEBUG [RequestResponseStage:104] 2013-07-25 15:09:05,352
>> ProposeCallback.java (line 44) Propose response false from /17.163.7.20
>> DEBUG [RequestResponseStage:99] 2013-07-25 15:09:05,353
>> ProposeCallback.java (line 44) Propose response false from /17.163.7.195
>> DEBUG [RequestResponseStage:105] 2013-07-25 15:09:05,353
>> ProposeCallback.java (line 44) Propose response false from /17.163.7.184
>>
>> *At 17.163.7.20:*
>> *
>> *
>> DEBUG [MutationStage:58] 2013-07-25 15:09:05,347 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20b05e0-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>>
>> DEBUG [MutationStage:40] 2013-07-25 15:09:05,349 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>>
>> DEBUG [MutationStage:42] 2013-07-25 15:09:05,351 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>>
>> *DEBUG [MutationStage:43] 2013-07-25 15:09:05,352 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))*
>>
>> At 17.163.7.195:
>>
>> DEBUG [MutationStage:33] 2013-07-25 15:09:05,352 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>>
>> DEBUG [RequestResponseStage:38] 2013-07-25 15:09:05,352
>> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
>> Commit(658db3eababc5629b8bcb77dc84db81a,
>> 0193c330-f561-11e2-ad78-7bbb6a42087c, ColumnFamily(P
>> [62caa268bd7e90050000000000000001:false:21@1374780776163000,])),
>> Commit(658db3eababc5629b8bcb77dc84db81a,
>> d20c3e60-f576-11e2-b3a3-edb9f71dce8f, ColumnFamily(P
>> [62caa268bd7e90050000000000000001:false:21@1374780776163000,]))) from /
>> 17.163.7.162
>>
>> DEBUG [RequestResponseStage:48] 2013-07-25 15:09:05,353
>> ProposeCallback.java (line 44) Propose response false from /17.163.7.194
>> DEBUG [RequestResponseStage:54] 2013-07-25 15:09:05,353
>> ProposeCallback.java (line 44) Propose response false from /17.163.7.162
>>
>> *DEBUG [MutationStage:35] 2013-07-25 15:09:05,353 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))*
>>
>> *At 17.163.7.184:*
>>
>> DEBUG [MutationStage:53] 2013-07-25 15:09:05,347 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20b05e0-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>>
>> DEBUG [MutationStage:38] 2013-07-25 15:09:05,348 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>>
>> DEBUG [MutationStage:64] 2013-07-25 15:09:05,351 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>>
>> *DEBUG [MutationStage:33] 2013-07-25 15:09:05,351 PaxosState.java (line
>> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
>> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
>> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
>> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))*
>>
>>
>> Thanks,
>> Soumava
>>
>>
>>
>

Mime
View raw message