cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hayato Shimizu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-5632) Cross-DC bandwidth-saving broken
Date Tue, 18 Jun 2013 15:19:22 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686809#comment-13686809
] 

Hayato Shimizu edited comment on CASSANDRA-5632 at 6/18/13 3:18 PM:
--------------------------------------------------------------------

The patch fixes the issue of bandwidth-saving.

However, there seems to be two regressive issues being introduced.

1. DC2 coordinator selection by the DC1 coordinator is not equal across all available nodes
in DC2. Some nodes in DC2 are unused as coordinators.
2. When using cqlsh, with EACH_QUORUM/ALL, with tracing on, on a row insert, RPC timeout occurs
from a node that is not verifiable in the trace output.

Trace output has been attached for a 6 node cluster, DC1:3, DC2:3 replication factor configuration.
network-topology configuration is also attached for clarity.
                
      was (Author: hayato.shimizu):
    The patch fixes the issue of bandwidth-saving.

However, there seems to be two regressive issues being introduced.

1. Secondary DC coordinator selection by the primary DC coordinator is not equal across all
available nodes in secondary DC.
2. When using cqlsh, with EACH_QUORUM/ALL, with tracing on, on a row insert, RPC timeout occurs
from a node that is not verifiable in the trace output.

Trace output has been attached for a 6 node cluster, DC1:3, DC2:3 replication factor configuration.
network-topology configuration is also attached for clarity.
                  
> Cross-DC bandwidth-saving broken
> --------------------------------
>
>                 Key: CASSANDRA-5632
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5632
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.2.6
>
>         Attachments: 5632.txt, cassandra-topology.properties, fix_patch_bug.log
>
>
> We group messages by destination as follows to avoid sending multiple messages to a remote
datacenter:
> {code}
>         // Multimap that holds onto all the messages and addresses meant for a specific
datacenter
>         Map<String, Multimap<Message, InetAddress>> dcMessages
> {code}
> When we cleaned out the MessageProducer stuff for 2.0, this code
> {code}
>                     Multimap<Message, InetAddress> messages = dcMessages.get(dc);
> ...
>                     messages.put(producer.getMessage(Gossiper.instance.getVersion(destination)),
destination);
> {code}
> turned into
> {code}
>                     Multimap<MessageOut, InetAddress> messages = dcMessages.get(dc);
> ...
>                     messages.put(rm.createMessage(), destination);
> {code}
> Thus, we weren't actually grouping anything anymore -- each destination replica was stored
under a separate Message key, unlike under the old CachingMessageProducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message