cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Knighton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11740) Nodes have wrong membership view of the cluster
Date Tue, 05 Jul 2016 14:42:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362569#comment-15362569
] 

Joel Knighton commented on CASSANDRA-11740:
-------------------------------------------

I don't have any great ideas here other than Jeremiah's suggestion above. When using GPFS,
there's a hierarchy of lookup that will happen.

First, we look for the information in gossip.

Then, if we have a fallback PropertyFileSnitch, we will use that.
If we don't, we'll first look in the system keyspace and then return defaults. The default
for GPFS is UNKNOWN_RACK/UNKNOWN_DC.

I have no ideas how these values could get in gossip or the system keyspace of the node without
having this configured in a file.

Since DC1/r1 are the default options given in the sample cassandra-topology.properties distributed
with Cassandra, it seems likely that this config file has not been removed from all nodes.

That said, if the information isn't present in gossip, there likely is something else that's
a problem. This could be better debugged with debug/trace level logs for some node A with
bad nodetool status output for node B as well as the debug/trace level logs for node B.

> Nodes have wrong membership view of the cluster
> -----------------------------------------------
>
>                 Key: CASSANDRA-11740
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11740
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Dikang Gu
>            Assignee: Joel Knighton
>             Fix For: 2.2.x, 3.x
>
>
> We have a few hundreds nodes across 3 data centers, and we are doing a few millions writes
per second into the cluster.
> The problem we found is that there are some nodes (>10) have very wrong view of the
cluster.
> For example, we have 3 data centers A, B and C. On the problem nodes, in the output of
the 'nodetool status', it shows that ~100 nodes are not in data center A, B, or C. Instead,
it shows nodes are in DC1, and rack r1, which is very wrong. And as a result, the node will
return wrong results to client requests.
> {code}
> Datacenter: DC1
> ===============
> Status=Up/Down
> / State=Normal/Leaving/Joining/Moving
> – Address Load Tokens Owns Host ID Rack
> UN 2401:db00:11:6134:face:0:1:0 509.52 GB 256 ? e24656ac-c3b2-4117-b933-a5b06852c993
r1
> UN 2401:db00:11:b218:face:0:5:0 510.01 GB 256 ? 53da2104-b1b5-4fa5-a3dd-52c7557149f9
r1
> UN 2401:db00:2130:5133:face:0:4d:0 459.75 GB 256 ? ef8311f0-f6b8-491c-904d-baa925cdd7c2
r1
> {code}
> We are using GossipingPropertyFileSnitch.
> Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message