cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8819) LOCAL_QUORUM writes returns wrong message
Date Wed, 18 Feb 2015 10:42:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325696#comment-14325696
] 

Sylvain Lebresne commented on CASSANDRA-8819:
---------------------------------------------

There is both a joining and a leaving node (in DC2), so what probably happens is that for
some writes there is 2 "pending endpoints". During a write, if there is a joining node and
that node is a future replica for the write (no matter its datacenter), we add it to the number
of nodes waited for (thus making us temporarily wait for more node than strictly required
by the consistency) to avoid potentially failing CL guarantees post-bootstrap, see CASSANDRA-833.
I'll note that this is the explanation for the error message, and the error message is not
really the problem here: cassandra *does* sometimes wait for more node that what the CL initially
suggests and that is *not* a bug.

What is a bug though is that this code is broken for LOCAL_QUORUM reads because while we can
end up waiting for more nodes than there is replica in the local datacenter, {{DataCenterWriteResponseHandler}}
ignores acks from other DC nodes and so it's a guaranteed timeout in that situation (even
though, in the example, there should be at least 1 replica up in DC2 and so we most likely
did got the proper number of answers, we just ignored one).

We could fix {{DataCenterWriteResponseHandler}} to count _some_ responses from remote DCs
when there is pending endpoints (as long as we still fulfill a local quorum of course), but
the bigger question is maybe whether we should include pending endpoints that are not local
in {{AbstractWriteResponseHandler.totalBlockFor}} in the first place. I'll note that _in theory_,
counting them as we do is kind of the correct thing to do: CL guarantees are based on the
combination of the write and read CL, and we don't know what CL the user will use on read.
But in practice, using {{LOCAL_QUORUM}} means you only care about local DC guarantees and
so I think it would be fine and a lot more practical to exclude non-local endpoints.

Thoughts ([~jbellis] in particular since you were the author of CASSANDRA-833)?  

> LOCAL_QUORUM writes returns wrong message
> -----------------------------------------
>
>                 Key: CASSANDRA-8819
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8819
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: CentOS 6.6
>            Reporter: Wei Zhu
>            Assignee: Tyler Hobbs
>             Fix For: 2.0.13
>
>
> We have two DC3, each with 7 nodes.
> Here is the keyspace setup:
>  create keyspace test
>  with placement_strategy = 'NetworkTopologyStrategy'
>  and strategy_options = {DC2 : 3, DC1 : 3}
>  and durable_writes = true;
> We brought down two nodes in DC2 for maintenance. We only write to DC1 using local_quroum
(using datastax JavaClient)
> But we see this errors in the log:
> Cassandra timeout during write query at consistency LOCAL_QUORUM (4 replica were required
but only 3 acknowledged the write
> why does it say 4 replica were required? and Why would it give error back to client since
local_quorum should succeed.
> Here are the output from nodetool status
> Note: Ownership information does not include topology; for complete information, specify
a keyspace
> Datacenter: DC2
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens  Owns   Host ID                               Rack
> UN  10.2.0.1  10.92 GB   256     7.9%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC206
> UN  10.2.0.2   6.17 GB    256     8.0%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC106
> UN  10.2.0.3  6.63 GB    256     7.3%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC107
> DL  10.2.0.4  1.54 GB    256     7.7%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC107
> UN  10.2.0.5  6.02 GB    256     6.6%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC106
> UJ  10.2.0.6   3.68 GB    256     ?      XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC205
> UN  10.2.0.7  7.22 GB    256     7.7%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC205
> Datacenter: DC1
> ===============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address      Load       Tokens  Owns   Host ID                               Rack
> UN  10.1.0.1   6.04 GB    256     8.6%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC10
> UN  10.1.0.2   7.55 GB    256     7.4%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC8
> UN  10.1.0.3   5.83 GB    256     7.0%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC9
> UN  10.1.0.4    7.34 GB    256     7.9%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC6
> UN  10.1.0.5   7.57 GB    256     8.0%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC7
> UN  10.1.0.6   5.31 GB    256     7.3%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX  RAC10
> UN  10.1.0.7   5.47 GB    256     8.6%   XXXXXXXXXXXXXXXXXXXXXXXXXXXX RAC9
> I did a cql trace on the query and here is the trace, and it does say 
>    Write timeout; received 3 of 4 required replies | 17:27:52,831 |  10.1.0.1 |     
  2002873
> at the end. I guess that is where the client gets the error from. But the rows was inserted
to Cassandra correctly. And I traced read with local_quorum and it behaves correctly and the
reads don't go to DC2. The problem is only with writes on local_quorum.
> {code}
> Tracing session: 5a789fb0-b70d-11e4-8fca-99bff9c19890
>  activity                                                                           
                                                        | timestamp    | source      | source_elapsed
> ---------------------------------------------------------------------------------------------------------------------------------------------+--------------+-------------+----------------
>                                                                                     
                                     execute_cql3_query | 17:27:50,828 |  10.1.0.1 |     
        0
>  Parsing insert into test (user_id, created, event_data, event_id)values ( 123456789
, 9eab8950-b70c-11e4-8fca-99bff9c19891, 'test', '16'); | 17:27:50,828 |  10.1.0.1 |      
      39
>                                                                                     
                                    Preparing statement | 17:27:50,828 |  10.1.0.1 |     
      135
>                                                                                     
                      Message received from /10.1.0.1 | 17:27:50,829 |  10.1.0.5 |       
     25
>                                                                                     
                         Sending message to /10.1.0.5 | 17:27:50,829 |  10.1.0.1 |       
    421
>                                                                                     
              Executing single-partition query on users | 17:27:50,829 |  10.1.0.5 |     
      177
>                                                                                     
                           Acquiring sstable references | 17:27:50,829 |  10.1.0.5 |     
      191
>                                                                                     
                            Merging memtable tombstones | 17:27:50,830 |  10.1.0.5 |     
      208
>                                                                                     
                      Message received from /10.1.0.5 | 17:27:50,830 |  10.1.0.1 |       
   1461
>                                                                                     
                      Message received from /10.1.0.1 | 17:27:50,830 |  10.1.0.2 |       
     14
>                                                                                     
                            Key cache hit for sstable 1 | 17:27:50,830 |  10.1.0.5 |     
      254
>                                                                                     
                   Processing response from /10.1.0.5 | 17:27:50,830 |  10.1.0.1 |       
   1514
>                                                                                     
              Executing single-partition query on users | 17:27:50,830 |  10.1.0.2 |     
       78
>                                                                                     
            Seeking to partition beginning in data file | 17:27:50,830 |  10.1.0.5 |     
      264
>                                                                                     
                         Sending message to /10.1.0.2 | 17:27:50,830 |  10.1.0.1 |       
   1517
>                                                                                     
                           Acquiring sstable references | 17:27:50,830 |  10.1.0.2 |     
       85
>                                                                    Skipped 0/1 non-slice-intersecting
sstables, included 0 due to tombstones | 17:27:50,830 |  10.1.0.5 |            453
>                                                                                     
                      Determining replicas for mutation | 17:27:50,830 |  10.1.0.1 |     
     1746
>                                                                                     
                            Merging memtable tombstones | 17:27:50,830 |  10.1.0.2 |     
       97
>                                                                                     
             Merging data from memtables and 1 sstables | 17:27:50,830 |  10.1.0.5 |     
      476
>                                                                                     
                      Message received from /10.1.0.2 | 17:27:50,830 |  10.1.0.1 |       
   2369
>                                                                                     
                            Key cache hit for sstable 2 | 17:27:50,830 |  10.1.0.2 |     
      120
>                                                                                     
                     Read 1 live and 0 tombstoned cells | 17:27:50,830 |  10.1.0.5 |     
      506
>                                                                                     
            Seeking to partition beginning in data file | 17:27:50,830 |  10.1.0.2 |     
      123
>                                                                                     
                      Enqueuing response to /10.1.0.1 | 17:27:50,830 |  10.1.0.5 |       
    570
>                                                                    Skipped 0/1 non-slice-intersecting
sstables, included 0 due to tombstones | 17:27:50,830 |  10.1.0.2 |            288
>                                                                                     
                         Sending message to /10.1.0.1 | 17:27:50,830 |  10.1.0.5 |       
    617
>                                                                                     
             Merging data from memtables and 1 sstables | 17:27:50,830 |  10.1.0.2 |     
      297
>                                                                                     
                     Read 1 live and 0 tombstoned cells | 17:27:50,830 |  10.1.0.2 |     
      319
>                                                                                     
                      Enqueuing response to /10.1.0.1 | 17:27:50,830 |  10.1.0.2 |       
    362
>                                                                                     
                         Sending message to /10.1.0.1 | 17:27:50,830 |  10.1.0.2 |       
    420
>                                                                                     
                      Message received from /10.1.0.1 | 17:27:50,831 |   10.1.0.4 |      
      17
>                                                                                     
                         Sending message to /10.1.0.2 | 17:27:50,831 |  10.1.0.1 |       
   2435
>                                                                                     
                      Message received from /10.1.0.1 | 17:27:50,831 |  10.1.0.2 |       
      8
>                                                                                     
                         Acquiring switchLock read lock | 17:27:50,831 |   10.1.0.4 |    
        61
>                                                                                     
                   Processing response from /10.1.0.2 | 17:27:50,831 |  10.1.0.1 |       
   2488
>                                                                                     
                         Acquiring switchLock read lock | 17:27:50,831 |  10.1.0.2 |     
       44
>                                                                                     
                                 Appending to commitlog | 17:27:50,831 |   10.1.0.4 |    
        78
>                                                                                     Not
hinting /10.2.0.4 which has been down 364809650ms | 17:27:50,831 |  10.1.0.1 |           2503
>                                                                                     
                                 Appending to commitlog | 17:27:50,831 |  10.1.0.2 |     
       62
>                                                                                     
                               Adding to event memtable | 17:27:50,831 |   10.1.0.4 |    
        96
>                                                                                     
                          Sending message to /10.1.0.4 | 17:27:50,831 |  10.1.0.1 |      
    2557
>                                                                                     
                               Adding to event memtable | 17:27:50,831 |  10.1.0.2 |     
       80
>                                                                                     
                      Enqueuing response to /10.1.0.1 | 17:27:50,831 |   10.1.0.4 |      
     177
>                                                                                     
                         Acquiring switchLock read lock | 17:27:50,831 |  10.1.0.1 |     
     2669
>                                                                                     
                      Enqueuing response to /10.1.0.1 | 17:27:50,831 |  10.1.0.2 |       
    160
>                                                                                     
                         Sending message to /10.1.0.1 | 17:27:50,831 |   10.1.0.4 |      
     333
>                                                                                     
                                 Appending to commitlog | 17:27:50,831 |  10.1.0.1 |     
     2682
>                                                                                     
                         Sending message to /10.1.0.1 | 17:27:50,831 |  10.1.0.2 |       
    266
>                                                                                     
                               Adding to event memtable | 17:27:50,831 |  10.1.0.1 |     
     2720
>                                                                                     
                        Sending message to /10.2.0.5 | 17:27:50,831 |  10.1.0.1 |        
  2758
>                                                                                     
                      Message received from /10.1.0.2 | 17:27:50,831 |  10.1.0.1 |       
   2989
>                                                                                     
                   Processing response from /10.1.0.2 | 17:27:50,831 |  10.1.0.1 |       
   3024
>                                                                                     
                       Message received from /10.1.0.4 | 17:27:50,832 |  10.1.0.1 |      
    3764
>                                                                                     
                    Processing response from /10.1.0.4 | 17:27:50,832 |  10.1.0.1 |      
    3805
>                                                                                     
                      Message received from /10.1.0.1 | 17:27:50,841 | 10.2.0.5 |        
    24
>                                                                                     
              Enqueuing forwarded write to /10.2.0.7 | 17:27:50,841 | 10.2.0.5 |         
  255
>                                                                                     
              Enqueuing forwarded write to /10.2.0.3 | 17:27:50,841 | 10.2.0.5 |         
  283
>                                                                                     
               Enqueuing forwarded write to /10.2.0.6 | 17:27:50,841 | 10.2.0.5 |        
   307
>                                                                                     
                         Acquiring switchLock read lock | 17:27:50,841 | 10.2.0.5 |      
     362
>                                                                                     
                                 Appending to commitlog | 17:27:50,841 | 10.2.0.5 |      
     380
>                                                                                     
                        Sending message to /10.2.0.7 | 17:27:50,841 | 10.2.0.5 |         
  382
>                                                                                     
                               Adding to event memtable | 17:27:50,841 | 10.2.0.5 |      
     411
>                                                                                     
                         Sending message to /10.2.0.6 | 17:27:50,841 | 10.2.0.5 |        
   429
>                                                                                     
                      Enqueuing response to /10.1.0.1 | 17:27:50,841 | 10.2.0.5 |        
   484
>                                                                                     
                        Sending message to /10.2.0.3 | 17:27:50,841 | 10.2.0.5 |         
  561
>                                                                                     
                         Sending message to /10.1.0.1 | 17:27:50,841 | 10.2.0.5 |        
   625
>                                                                                     
                         Acquiring switchLock read lock | 17:27:50,842 |  10.2.0.6 |     
     1031
>                                                                                     
                         Acquiring switchLock read lock | 17:27:50,842 | 10.2.0.7 |      
     178
>                                                                                     
                                 Appending to commitlog | 17:27:50,842 |  10.2.0.6 |     
     1066
>                                                                                     
                                 Appending to commitlog | 17:27:50,842 | 10.2.0.7 |      
     196
>                                                                                     
                               Adding to event memtable | 17:27:50,842 |  10.2.0.6 |     
     1118
>                                                                                     
                               Adding to event memtable | 17:27:50,842 | 10.2.0.7 |      
     231
>                                                                                     
                      Enqueuing response to /10.1.0.1 | 17:27:50,842 |  10.2.0.6 |       
   1181
>                                                                                     
                      Enqueuing response to /10.1.0.1 | 17:27:50,842 | 10.2.0.7 |        
   286
>                                                                                     
                         Sending message to /10.1.0.1 | 17:27:50,842 | 10.2.0.7 |        
   421
>                                                                                     
                         Acquiring switchLock read lock | 17:27:50,843 | 10.2.0.3 |      
    1216
>                                                                                     
                         Sending message to /10.1.0.1 | 17:27:50,843 |  10.2.0.6 |       
   1382
>                                                                                     
                                 Appending to commitlog | 17:27:50,843 | 10.2.0.3 |      
    1239
>                                                                                     
                               Adding to event memtable | 17:27:50,843 | 10.2.0.3 |      
    1313
>                                                                                     
                      Enqueuing response to /10.1.0.1 | 17:27:50,843 | 10.2.0.3 |        
  1407
>                                                                                     
                         Sending message to /10.1.0.1 | 17:27:50,843 | 10.2.0.3 |        
  1631
>                                                                                     
                     Message received from /10.2.0.5 | 17:27:50,851 |  10.1.0.1 |        
 23333
>                                                                                     
                  Processing response from /10.2.0.5 | 17:27:50,851 |  10.1.0.1 |        
 23380
>                                                                                     
                     Message received from /10.2.0.7 | 17:27:50,852 |  10.1.0.1 |        
 23908
>                                                                                     
                  Processing response from /10.2.0.7 | 17:27:50,852 |  10.1.0.1 |        
 23953
>                                                                                     
                      Message received from /10.2.0.6 | 17:27:50,853 |  10.1.0.1 |       
  25143
>                                                                                     
                   Processing response from /10.2.0.6 | 17:27:50,853 |  10.1.0.1 |       
  25178
>                                                                                     
                     Message received from /10.2.0.3 | 17:27:50,854 |  10.1.0.1 |        
 25478
>                                                                                     
                  Processing response from /10.2.0.3 | 17:27:50,854 |  10.1.0.1 |        
 25516
>                                                                                     
        Write timeout; received 3 of 4 required replies | 17:27:52,831 |  10.1.0.1 |     
  2002873
>                                                                                     
                                       Request complete | 17:27:52,833 |  10.1.0.1 |     
  2005989
> {code}
> cqlsh:XXXX> CONSISTENCY
> Current consistency level is LOCAL_QUORUM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message