cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12791) MessageIn logic to determine if the message is cross-node is wrong
Date Mon, 17 Oct 2016 03:47:58 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581110#comment-15581110
] 

Stefania commented on CASSANDRA-12791:
--------------------------------------

The patch is fine, I'm not opposed to removing {{ConstructionTime}} given that the meaning
of {{isCrossNode}} now becomes a property of the message rather than the timestamp. Also,
I think 3.X is reasonable for this patch because CASSANDRA-10580 was added in 3.2, see below.

{quote}
And that's where this is kind of a bug: not only the timestamp != crossNodeTimestamp, but
if DatabaseDescriptor.hasCrossNodeTimeout(), we always have this isCrossNode false, which
means we'll never increment the "cross-node dropped messages" metric, which is imo unexpected.
{quote}

The confusion is due to the fact that {{ConstructionTime.isCrossNode}} was intended to be
a property of the construction timestamp, as it indicates whether the timestamp originated
at the sender or at the receiver, the intent was not to indicate that the message itself is
cross node, although comparing the timestamps to determine this was incorrect. CASSANDRA-10580
added the metrics in messaging service later on, and the comments in the code indicate that
the metrics refer to the actual messages being local vs. cross node, but using {{isCrossNodeTimeout}}
in messaging service was not correct. 

{quote}
Anyway, to sum it up I suggest that the following change should be done:
# the timestamp != crossNodeTimestamp test is definitively not what we want. We should at
a minimum just replace it to true as that's basically what it ends up being except for very
rare and arguably random cases.
# given how the ConstructionTime.isCrossNode is used, I suggest that we really want it to
mean if the message has shipped cross-node, not just be a synonymous of DatabaseDescriptor.hasCrossNodeTimeout().
It should be whether the message shipped cross-node, i.e. whether from == BroadcastAdress()
or not.
{quote}

{{timestamp != crossNodeTimestamp}} orginates from this [comment|https://issues.apache.org/jira/browse/CASSANDRA-9793?focusedCommentId=14635441&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14635441]
from CASSANDRA-9793. It is not only used for the metrics in MessagingService but also for
distinguishing messages dropped due to a cross node timeout vs. a local timeout in the logs.
The intent is to help operators work out if messages are dropped because of clock skew. So,
at this line [here|https://github.com/pcmanus/cassandra/commit/0f3d7f6318be11b095bbe21d0c848da6409d1a93#diff-af09288f448c37a525e831ee90ea49f9R1204],
we need to also check {{DD.hasCrossNodeTimeout()}}, a message originating cross node is not
sufficient. 

{{isCrossNodeTimeout}} is probably a misleading name now, and should become simply {{isCrossNode}}.

I would like to give a heads up to [~brandon.williams] to make sure he agrees that it's OK,
from an operator point of view, to classify a dropped message in the logs as dropped due to
cross node timeout if:

* the message originates from a different node
* {{DD.hasCrossNodeTimeout()}} is true

given that it is extremely rare for a message to be received in the same millisecond and have
the machines perfectly synchronized and, if the machines are not synchronized, then we can
still have an identical timestamp due to chance, so the best thing we can do is look at the
yaml property.

As for the rest of the patch:

* There a typo at the end of this [line|https://github.com/pcmanus/cassandra/commit/0f3d7f6318be11b095bbe21d0c848da6409d1a93#diff-2578da7d6bbdd276157604856543cbecR43],
the {{:}} should be {{;}}, this is the reason for the failures in {{MonitoringTaskTest}} and
all of the dtests.

* Another typo [here|https://github.com/pcmanus/cassandra/commit/0f3d7f6318be11b095bbe21d0c848da6409d1a93#diff-70a9824fd63a5f3970840e376918da3eR137]
in the comments: {{lower}} -> {{higher}}.

> MessageIn logic to determine if the message is cross-node is wrong
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-12791
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12791
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>
> {{MessageIn}} has the following code to read the 'creation time' of the message on the
receiving side:
> {noformat}
> public static ConstructionTime readTimestamp(InetAddress from, DataInputPlus input, long
timestamp) throws IOException
> {
>     // make sure to readInt, even if cross_node_to is not enabled
>     int partial = input.readInt();
>     long crossNodeTimestamp = (timestamp & 0xFFFFFFFF00000000L) | (((partial &
0xFFFFFFFFL) << 2) >> 2);
>     if (timestamp > crossNodeTimestamp)
>     {
>         MessagingService.instance().metrics.addTimeTaken(from, timestamp - crossNodeTimestamp);
>     }
>     if(DatabaseDescriptor.hasCrossNodeTimeout())
>     {
>         return new ConstructionTime(crossNodeTimestamp, timestamp != crossNodeTimestamp);
>     }
>     else
>     {
>         return new ConstructionTime();
>     }
> }
> {noformat}
> where {{timestamp}} is really the local time on the receiving node when calling that
method.
> The incorrect part, I believe, is the {{timestamp != crossNodeTimestamp}} used to set
the {{isCrossNode}} field of {{ConstructionTime}}. A first problem is that this will basically
always be {{true}}: for it to be {{false}}, we'd need the low-bytes of the timestamp taken
on the sending node to coincide exactly with the ones taken on the receiving side, which is
_very_ unlikely. It is also a relatively meaningless test: having that test be {{false}} basically
means the lack of clock sync between the 2 nodes is exactly the time the 2 calls to {{System.currentTimeMillis()}}
(on sender and receiver), which is definitively not what we care about.
> What the result of this test is used for is to determine if the message was crossNode
or local. It's used to increment different metrics (we separate metric local versus crossNode
dropped messages) in {{MessagingService}} for instance. And that's where this is kind of a
bug: not only the {{timestamp != crossNodeTimestamp}}, but if {{DatabaseDescriptor.hasCrossNodeTimeout()}},
we *always* have this {{isCrossNode}} false, which means we'll never increment the "cross-node
dropped messages" metric, which is imo unexpected.
> That is, it is true that if {{DatabaseDescriptor.hasCrossNodeTimeout() == false}}, then
we end using the receiver side timestamp to timeout messages, and so you end up only dropping
messages that timeout locally. And _in that sense_, always incrementing the "locally" dropped
messages metric is not completely illogical. But I doubt most users are aware of those pretty
specific nuance when looking at the related metrics, and I'm relatively sure users expect
a metrics named {{droppedCrossNodeTimeout}} to actually count cross-node messages by default
(keep in mind that {{DatabaseDescriptor.hasCrossNodeTimeout()}} is actually false by default).
> Anyway, to sum it up I suggest that the following change should be done:
> # the {{timestamp != crossNodeTimestamp}} test is definitively not what we want. We should
at a minimum just replace it to {{true}} as that's basically what it ends up being except
for very rare and arguably random cases.
> # given how the {{ConstructionTime.isCrossNode}} is used, I suggest that we really want
it to mean if the message has shipped cross-node, not just be a synonymous of {{DatabaseDescriptor.hasCrossNodeTimeout()}}.
It should be whether the message shipped cross-node, i.e. whether {{from == BroadcastAdress()}}
or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message