Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 66E6A200C23 for ; Wed, 22 Feb 2017 15:37:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 657EF160B75; Wed, 22 Feb 2017 14:37:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AF789160B67 for ; Wed, 22 Feb 2017 15:37:49 +0100 (CET) Received: (qmail 50567 invoked by uid 500); 22 Feb 2017 14:37:48 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 50555 invoked by uid 99); 22 Feb 2017 14:37:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Feb 2017 14:37:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 70E191A02B8 for ; Wed, 22 Feb 2017 14:37:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.998 X-Spam-Level: X-Spam-Status: No, score=-1.998 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id q7EHtUjqJS5g for ; Wed, 22 Feb 2017 14:37:47 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 4033460F56 for ; Wed, 22 Feb 2017 14:37:47 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 79DCFE0BCB for ; Wed, 22 Feb 2017 14:37:45 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id A42AD24146 for ; Wed, 22 Feb 2017 14:37:44 +0000 (UTC) Date: Wed, 22 Feb 2017 14:37:44 +0000 (UTC) From: "Jason Brown (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-12653) In-flight shadow round requests MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 22 Feb 2017 14:37:50 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878328#comment-15878328 ] Jason Brown commented on CASSANDRA-12653: ----------------------------------------- On the whole, I'm pretty good with this patch - nice work, Stefan! I have two nits: - the 3.11 and trunk branches have the tests included, but 2.2 and 3.0 do not. I that just an oversight? Can we add the tests to those branches, as well? - {{GossipDigestAckVerbHandler#doVerb}}, you get the [following timestamp|https://github.com/spodkowinski/cassandra/commit/9179b7ee06c51a79881f5be18cd01261ebe62143#diff-787d4963a51f20221468e976df1b121aR64]: {code} long ts = epStateMap.values().iterator().next().getUpdateTimestamp();{code} We [do not serialize|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/gms/EndpointState.java#L166] {{EndpointState.updateTimestamp}}, so the value at the receiver ends up being the receiver's {{System.nanoTime()}}, as can be seen from the [{{EndpointState}} constructor|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/gms/EndpointState.java#L58]. What is it you are looking to confirm here? I'm kinda leaning toward eliminating that check altogether as {code}Gossiper.instance.firstSynSendAt == 0{code} might be a sufficient check anyways. > In-flight shadow round requests > ------------------------------- > > Key: CASSANDRA-12653 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12653 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata > Reporter: Stefan Podkowinski > Assignee: Stefan Podkowinski > Priority: Minor > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > > Bootstrapping or replacing a node in the cluster requires to gather and check some host IDs or tokens by doing a gossip "shadow round" once before joining the cluster. This is done by sending a gossip SYN to all seeds until we receive a response with the cluster state, from where we can move on in the bootstrap process. Receiving a response will call the shadow round done and calls {{Gossiper.resetEndpointStateMap}} for cleaning up the received state again. > The issue here is that at this point there might be other in-flight requests and it's very likely that shadow round responses from other seeds will be received afterwards, while the current state of the bootstrap process doesn't expect this to happen (e.g. gossiper may or may not be enabled). > One side effect will be that MigrationTasks are spawned for each shadow round reply except the first. Tasks might or might not execute based on whether at execution time {{Gossiper.resetEndpointStateMap}} had been called, which effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}} at start of the task. You'll see error log messages such as follows when this happend: > {noformat} > INFO [SharedPool-Worker-1] 2016-09-08 08:36:39,255 Gossiper.java:993 - InetAddress /xx.xx.xx.xx is now UP > ERROR [MigrationStage:1] 2016-09-08 08:36:39,255 FailureDetector.java:223 - unknown endpoint /xx.xx.xx.xx > {noformat} > Although is isn't pretty, I currently don't see any serious harm from this, but it would be good to get a second opinion (feel free to close as "wont fix"). > /cc [~Stefania] [~thobbs] -- This message was sent by Atlassian JIRA (v6.3.15#6346)