incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soumava Ghosh <soum...@cs.utexas.edu>
Subject Re: Cassandra 2.0: Paxos Prepare response always false
Date Tue, 30 Jul 2013 00:56:35 GMT
Hi aaron,

I think I know the reason for this bug. Following is the code segment
(StorageProxy.java:328) which causes the issue:

Start is the start time of the paxos, is always less than the current
system time, and therefore the negative difference is always less than the
timeout.

[image: Inline image 1]


Here, the paxos gets stuck when PREPARE returns 'true' but with
inProgressCommit. The code in StorageProxy.java:beginAndRepairPaxos() then
tries to issue a PREPARE and COMMIT for the inProgressCommit, and if it
repeatedly receives 'false' as a PREPARE_RESPONSE it gets stuck in an
endless loop until PREPARE_RESPONSE is true.

Thanks,
Soumava


On Mon, Jul 29, 2013 at 1:00 AM, aaron morton <aaron@thelastpickle.com>wrote:

> Thanks for looking into this.
>
> If you have a way to reproduce this the best thing to do is create a
> ticket at https://issues.apache.org/jira/browse/CASSANDRA as 2.0 is still
> under development.
>
> Cheers
>
>
>    -----------------
> Aaron Morton
> Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 26/07/2013, at 11:29 AM, Soumava Ghosh <soumava@cs.utexas.edu> wrote:
>
> Hi,
>
> I have test setup where clients randomly make a controlled number of cas()
> requests (among other requests) at a cluster of cassandra 2.0 servers.
> After one point, I'm seeing that all requests are pending and my client's
> throughput has reduced to 0.0 for all kinds of requests. For this specific
> case I had 10 clients each making around 30 cas() requests per second at a
> cluster of 72 instances of cassandra.
>
> Clients are set up to register a request as a success after the cas() call
> returns with CASResult.success = true, else an exception is thrown. Since I
> see that no client requests were actually registered and no exceptions were
> thrown, which indicates that the cas() call itself is hung.
>
> On the server side, I see Paxos logs as follows - they go on for 50 log
> files for each of the servers involved, and they span at least an hour. I
> have marked a particular instance where the prepare response is true but
> the propose response is false from all the involved servers:
>
> *At the Paxos Initiator: * None of the files among the 50 system logs
> have the phrase 'Propose response true', these logs just go on and on.
> *
> *
> DEBUG [RequestResponseStage:110] 2013-07-25 15:09:05,332
> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> d2093120-f576-11e2-a57e-a154d605509d, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
> 17.163.7.195
> *
> *
> DEBUG [RequestResponseStage:92] 2013-07-25 15:09:05,346
> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> d2093120-f576-11e2-a57e-a154d605509d, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
> 17.163.7.184
>
> DEBUG [RequestResponseStage:98] 2013-07-25 15:09:05,347
> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> d2093120-f576-11e2-a57e-a154d605509d, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
> 17.163.7.20
>
> DEBUG [RequestResponseStage:93] 2013-07-25 15:09:05,350
> ProposeCallback.java (line 44) Propose response false from /17.163.7.20
> DEBUG [RequestResponseStage:100] 2013-07-25 15:09:05,350
> ProposeCallback.java (line 44) Propose response false from /17.163.7.184
> DEBUG [RequestResponseStage:111] 2013-07-25 15:09:05,350
> ProposeCallback.java (line 44) Propose response false from /17.163.7.195
>
> DEBUG [RequestResponseStage:102] 2013-07-25 15:09:05,351
> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
> 17.163.7.195
>
> DEBUG [RequestResponseStage:107] 2013-07-25 15:09:05,352
> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
> 17.163.7.20
>
> DEBUG [RequestResponseStage:108] 2013-07-25 15:09:05,352
> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> 1a0c4220-f561-11e2-a409-019f62d610d7, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])),
> Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))) from /
> 17.163.7.184
>
> DEBUG [RequestResponseStage:104] 2013-07-25 15:09:05,352
> ProposeCallback.java (line 44) Propose response false from /17.163.7.20
> DEBUG [RequestResponseStage:99] 2013-07-25 15:09:05,353
> ProposeCallback.java (line 44) Propose response false from /17.163.7.195
> DEBUG [RequestResponseStage:105] 2013-07-25 15:09:05,353
> ProposeCallback.java (line 44) Propose response false from /17.163.7.184
>
> *At 17.163.7.20:*
> *
> *
> DEBUG [MutationStage:58] 2013-07-25 15:09:05,347 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20b05e0-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>
> DEBUG [MutationStage:40] 2013-07-25 15:09:05,349 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>
> DEBUG [MutationStage:42] 2013-07-25 15:09:05,351 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>
> *DEBUG [MutationStage:43] 2013-07-25 15:09:05,352 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))*
>
> At 17.163.7.195:
>
> DEBUG [MutationStage:33] 2013-07-25 15:09:05,352 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>
> DEBUG [RequestResponseStage:38] 2013-07-25 15:09:05,352
> PrepareCallback.java (line 58) Prepare response PrepareResponse(true,
> Commit(658db3eababc5629b8bcb77dc84db81a,
> 0193c330-f561-11e2-ad78-7bbb6a42087c, ColumnFamily(P
> [62caa268bd7e90050000000000000001:false:21@1374780776163000,])),
> Commit(658db3eababc5629b8bcb77dc84db81a,
> d20c3e60-f576-11e2-b3a3-edb9f71dce8f, ColumnFamily(P
> [62caa268bd7e90050000000000000001:false:21@1374780776163000,]))) from /
> 17.163.7.162
>
> DEBUG [RequestResponseStage:48] 2013-07-25 15:09:05,353
> ProposeCallback.java (line 44) Propose response false from /17.163.7.194
> DEBUG [RequestResponseStage:54] 2013-07-25 15:09:05,353
> ProposeCallback.java (line 44) Propose response false from /17.163.7.162
>
> *DEBUG [MutationStage:35] 2013-07-25 15:09:05,353 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))*
>
> *At 17.163.7.184:*
>
> DEBUG [MutationStage:53] 2013-07-25 15:09:05,347 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20b05e0-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>
> DEBUG [MutationStage:38] 2013-07-25 15:09:05,348 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20b7b10-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>
> DEBUG [MutationStage:64] 2013-07-25 15:09:05,351 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20bc930-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))
>
> *DEBUG [MutationStage:33] 2013-07-25 15:09:05,351 PaxosState.java (line
> 100) accept requested for Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c3e60-f576-11e2-9bbe-bf2ad4fe6707, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,])) but
> inProgress is now Commit(d145fe46f5d02a54b5ea95852f94c402,
> d20c6570-f576-11e2-a57e-a154d605509d, ColumnFamily(P
> [81d271b2125c59cb0000000000000003:false:27@1374780817218000,]))*
>
>
> Thanks,
> Soumava
>
>
>

Mime
View raw message