cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bryan Cheng <br...@blockcypher.com>
Subject Re: Trace evidence for LOCAL_QUORUM ending up in remote DC
Date Tue, 08 Sep 2015 19:09:30 GMT
Tom, I don't believe so; it seems the symptom would be an indefinite (or
very long) hang.

To clarify, is this issue restricted to LOCAL_QUORUM? Can you issue a
LOCAL_ONE SELECT and retrieve the expected data back?

On Tue, Sep 8, 2015 at 12:02 PM, Tom van den Berge <
tom.vandenberge@gmail.com> wrote:

> Just to be sure: can this bug result in a 0-row result while it should be
> > 0 ?
> Op 8 sep. 2015 6:29 PM schreef "Tyler Hobbs" <tyler@datastax.com>:
>
> See https://issues.apache.org/jira/browse/CASSANDRA-9753
>>
>> On Tue, Sep 8, 2015 at 10:22 AM, Tom van den Berge <
>> tom.vandenberge@gmail.com> wrote:
>>
>>> I've been bugging you a few times, but now I've got trace data for a
>>> query with LOCAL_QUORUM that is being sent to a remove data center.
>>>
>>> The setup is as follows:
>>> NetworkTopologyStrategy: {"DC1":"1","DC2":"2"}
>>> Both DC1 and DC2 have 2 nodes.
>>> In DC2, one node is currently being rebuilt, and therefore does not
>>> contain all data (yet).
>>>
>>> The client app connects to a node in DC1, and sends a SELECT query with
>>> CL LOCAL_QUORUM, which in this case means ((1/2)+1=1.
>>> If all is ok, the query always produces a result, because the requested
>>> rows are guaranteed to be available in DC1.
>>>
>>> However, the query sometimes produces no result. I've been able to
>>> record the traces of these queries, and it turns out that the coordinator
>>> node in DC1 sometimes sends the query to DC2, to the node that is being
>>> rebuilt, and does not have the requested rows. I've included an example
>>> trace below.
>>>
>>> The coordinator node is 10.55.156.67, which is in DC1. The 10.88.4.194 node
>>> is in DC2.
>>> I've verified that the  CL=LOCAL_QUORUM by printing it when the query is
>>> sent (I'm using the datastax java driver).
>>>
>>>  activity
>>>    | source       | source_elapsed | thread
>>>
>>> ---------------------------------------------------------------------------+--------------+----------------+-----------------------------------------
>>>                                        Message received from /
>>> 10.55.156.67 |  10.88.4.194 |             48 |
>>> MessagingService-Incoming-/10.55.156.67
>>>                              Executing single-partition query on
>>> aggregate |  10.88.4.194 |            286 |
>>> SharedPool-Worker-2
>>>                                               Acquiring sstable
>>> references |  10.88.4.194 |            306 |
>>> SharedPool-Worker-2
>>>                                                Merging memtable
>>> tombstones |  10.88.4.194 |            321 |
>>> SharedPool-Worker-2
>>>                         Partition index lookup allows skipping sstable
>>> 107 |  10.88.4.194 |            458 |
>>> SharedPool-Worker-2
>>>                                     Bloom filter allows skipping sstable
>>> 1 |  10.88.4.194 |            489 |                     SharedPool-Worker-2
>>>  Skipped 0/2 non-slice-intersecting sstables, included 0 due to
>>> tombstones |  10.88.4.194 |            496 |
>>> SharedPool-Worker-2
>>>                                 Merging data from memtables and 0
>>> sstables |  10.88.4.194 |            500 |
>>> SharedPool-Worker-2
>>>                                          Read 0 live and 0 tombstone
>>> cells |  10.88.4.194 |            513 |
>>> SharedPool-Worker-2
>>>                                        Enqueuing response to /
>>> 10.55.156.67 |  10.88.4.194 |            613 |
>>> SharedPool-Worker-2
>>>                                           Sending message to /
>>> 10.55.156.67 |  10.88.4.194 |            672 |
>>> MessagingService-Outgoing-/10.55.156.67
>>>                 Parsing SELECT * FROM Aggregate WHERE type=? AND
>>> typeId=?; | 10.55.156.67 |             10 |
>>> SharedPool-Worker-4
>>>                                            Sending message to /
>>> 10.88.4.194 | 10.55.156.67 |           4335 |
>>>  MessagingService-Outgoing-/10.88.4.194
>>>                                         Message received from /
>>> 10.88.4.194 | 10.55.156.67 |           6328 |
>>>  MessagingService-Incoming-/10.88.4.194
>>>                                Seeking to partition beginning in data
>>> file | 10.55.156.67 |          10417 |
>>> SharedPool-Worker-3
>>>                                              Key cache hit for sstable
>>> 389 | 10.55.156.67 |          10586 |
>>> SharedPool-Worker-3
>>>
>>> My question is: how is it possible that the query is sent to a node in
>>> DC2?
>>> Since DC1 has 2 nodes and RF 1, the query should always be sent to the
>>> other node in DC1 if the coordinator does not have a replica, right?
>>>
>>> Thanks,
>>> Tom
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>

Mime
View raw message