Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 5311B200AC0 for ; Tue, 24 May 2016 11:21:21 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 51AC0160A27; Tue, 24 May 2016 09:21:21 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4C1AD160A11 for ; Tue, 24 May 2016 11:21:20 +0200 (CEST) Received: (qmail 40668 invoked by uid 500); 24 May 2016 09:21:17 -0000 Mailing-List: contact user-help@ignite.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@ignite.apache.org Delivered-To: mailing list user@ignite.apache.org Received: (qmail 40580 invoked by uid 99); 24 May 2016 09:21:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2016 09:21:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id F02701A09E0 for ; Tue, 24 May 2016 09:21:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.18 X-Spam-Level: * X-Spam-Status: No, score=1.18 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 6ORSX6WrB9Gx for ; Tue, 24 May 2016 09:21:14 +0000 (UTC) Received: from mail-wm0-f44.google.com (mail-wm0-f44.google.com [74.125.82.44]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 0FB2A5F343 for ; Tue, 24 May 2016 09:21:14 +0000 (UTC) Received: by mail-wm0-f44.google.com with SMTP id a136so62748187wme.0 for ; Tue, 24 May 2016 02:21:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=IBq/4AZOJpYy5QixXwL+bxVzLCu5QkIuxswkoo4fsMw=; b=Z/JHsZS2eN3+oij/QXxc2Y9WcOWkCbhIIM4mkyTI7O6iLXRdGo0gljtLew4KIopX91 4eIBMBl3dtPWaGfiAl4ZoEl4GV3MUIds/gD4Ct66dAKjJ9+yp3VaZpdk8f95O22vfbhM bFlyUs+lxkQfEPvE2M8mpdw4j2cDLNGoevpEoFvZ1Z3+3G+wv+ObOxRqiKqT2R3Bm0An 01uBhYSs90kBEg6yQFkd/DRiUetg10iIlvIxXjbGfKT8sUTw07hLWPiVNHI4zktxDEOW RanEdkITh8ie/c7k0r1FZ2CX60puVa8QpJVOeA5atVsmpfrlPzxn/R4m8lUsPyRaaxK7 dzAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=IBq/4AZOJpYy5QixXwL+bxVzLCu5QkIuxswkoo4fsMw=; b=SHlxvpok9OZXhOzfJVrx3nCASOOKcsiAc2vPKhO1hcaXUIUP8ZZpcjC6X4dOLST/7j HZUq8EkUxTbEMMRu7uqgF1uZYnfxTGQoVtmxOX/9jRHORBu3UoVs4JGlJEpPVSA747ym ocIiHLw/W+I5OFAW+ZDtXh6aIaruF+d0S42PO3BCxMEUjQPd0YpJ7fw3x/I3Z0++Rnt9 yYe4jdhCDtnYUQ/TjQGeJwmN4dtmTqxTC4nRcdzPvcd9e/79LqX21r/y/dp+S8h6J0v7 +KlYmxVVpyRSU8dCBXIoCe8nbvhaVc/w/f3S5/O//lWu5u6HZpYF/BWUTs6RvLwYIXeY hxjQ== X-Gm-Message-State: ALyK8tKQv3bSfqwspcVS9k67vJPzHK1tnIGZE0Fz2d1yrRycMX0y58gmtkz3S6YHl99ngVReundGtd+5/ZLsSw== X-Received: by 10.194.3.84 with SMTP id a20mr3097230wja.77.1464081672425; Tue, 24 May 2016 02:21:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.130.18 with HTTP; Tue, 24 May 2016 02:20:52 -0700 (PDT) In-Reply-To: <1464015650982-5126.post@n6.nabble.com> References: <1464008104179-5111.post@n6.nabble.com> <1464015650982-5126.post@n6.nabble.com> From: Graham Bull Date: Tue, 24 May 2016 10:20:52 +0100 Message-ID: Subject: Re: Nodes running on different operating systems To: user@ignite.apache.org Content-Type: multipart/alternative; boundary=047d7b343d6e01d88905339314bf archived-at: Tue, 24 May 2016 09:21:21 -0000 --047d7b343d6e01d88905339314bf Content-Type: text/plain; charset=UTF-8 Thanks for the suggestion, but unfortunately it makes no difference. All three nodes are now using the same configuration, except that I've put each machine's local IP address at the top of the list: 192.168.56.1 192.168.56.101 192.168.56.102 I've noticed something interesting. If I start the Windows node first followed by just one of the Linux nodes, then the Linux node doesn't seem to be able to maintain a stable connection, and repeatedly connects then disconnects: [10:00:32] Topology snapshot [ver=1, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:01:00] Topology snapshot [ver=3, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:01:41] Topology snapshot [ver=7, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:01:41] Topology snapshot [ver=7, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:02:21] Topology snapshot [ver=11, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:02:21] Topology snapshot [ver=11, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:02:42] Topology snapshot [ver=13, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:02:42] Topology snapshot [ver=13, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:06:25] Topology snapshot [ver=35, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:06:25] Topology snapshot [ver=35, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:07:46] Topology snapshot [ver=43, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:07:46] Topology snapshot [ver=43, servers=1, clients=0, CPUs=8, heap=1.0GB] This is from the log (happens every 20 seconds): [10:07:46,035][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=a5982ff4-a30e-479d-b4c4-d2f18880d100, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.2.15, 127.0.0.1, 192.168.56.102], sockAddrs=[/192.168.56.102:47500, /0:0:0:0:0:0:0:1%lo:47500, /10.0.2.15:47500, /10.0.2.15:47500, / 127.0.0.1:47500, /192.168.56.102:47500], discPort=47500, order=42, intOrder=22, lastExchangeTime=1464080845973, loc=false, ver=1.6.0#20160518-sha1:0b22c45b, isClient=false] [10:07:46,035][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Topology snapshot [ver=43, servers=2, clients=0, CPUs=10, heap=2.0GB] [10:07:46,036][WARNING][disco-event-worker-#46%null%][GridDiscoveryManager] Node FAILED: TcpDiscoveryNode [id=a5982ff4-a30e-479d-b4c4-d2f18880d100, addrs=[0:0:0:0:0:0:0:1%lo, 10.0.2.15, 127.0.0.1, 192.168.56.102], sockAddrs=[/192.168.56.102:47500, /0:0:0:0:0:0:0:1%lo:47500, / 10.0.2.15:47500, /10.0.2.15:47500, /127.0.0.1:47500, /192.168.56.102:47500], discPort=47500, order=42, intOrder=22, lastExchangeTime=1464080845973, loc=false, ver=1.6.0#20160518-sha1:0b22c45b, isClient=false] [10:07:46,036][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Topology snapshot [ver=43, servers=1, clients=0, CPUs=8, heap=1.0GB] [10:07:46,043][INFO][exchange-worker-#49%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=42, minorTopVer=0], evt=NODE_JOINED, node=a5982ff4-a30e-479d-b4c4-d2f18880d100] [10:07:46,049][INFO][exchange-worker-#49%null%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=AffinityTopologyVersion [topVer=43, minorTopVer=0], evt=NODE_FAILED, node=a5982ff4-a30e-479d-b4c4-d2f18880d100] [10:07:56,298][WARNING][tcp-disco-msg-worker-#2%null%][TcpDiscoverySpi] Timed out waiting for message delivery receipt (most probably, the reason is in long GC pauses on remote node; consider tuning GC and increasing 'ackTimeout' configuration property). Will retry to send message with increased timeout. Current timeout: 9760. Thanks, Graham On 23 May 2016 at 16:00, vkulichenko wrote: > Graham, > > Default config means that multicast is used for discovery. Can you try > static IP configuration [1] and see if the issue is reproduced? > > [1] > > https://apacheignite.readme.io/docs/cluster-config#static-ip-based-discovery > > -Val > > > > -- > View this message in context: > http://apache-ignite-users.70518.x6.nabble.com/Nodes-running-on-different-operating-systems-tp5098p5126.html > Sent from the Apache Ignite Users mailing list archive at Nabble.com. > --047d7b343d6e01d88905339314bf Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks for the suggestion, but unfortunately it makes no d= ifference.

All three nodes are now using the same config= uration, except that I've put each machine's local IP address at th= e top of the list:

<?xml version=3D"1.0&qu= ot; encoding=3D"UTF-8"?>
=C2=A0 =C2= =A0 =C2=A0 =C2=A0xmlns:xsi=3D"http://www.w3.org/2001/XMLSchema-instance"
=C2=A0 <bean id=3D"ignite.= cfg" class=3D"org.apache.ignite.configuration.IgniteConfiguration= ">
=C2=A0 =C2=A0 <property name=3D"dis= coverySpi">
=C2=A0 =C2=A0 =C2=A0 <bean clas= s=3D"org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <property name=3D"ipF= inder">
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <= ;bean class=3D"org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDisc= overyVmIpFinder">
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 <property name=3D"addresses">
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <list>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <v= alue>192.168.56.1</value> <!--windows/host-->
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <value>= ;192.168.56.101</value> <!--linux1-->
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 <value>192.168.5= 6.102</value> <!--linux2-->
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 </list>
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 </property>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 </bean>
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 </property>
=C2=A0 = =C2=A0 =C2=A0 </bean>
=C2=A0 =C2=A0 </propert= y>
=C2=A0 </bean>
</b= eans>

I've noticed somet= hing interesting. If I start the Windows node first followed by just one of= the Linux nodes, then the Linux node doesn't seem to be able to mainta= in a stable connection, and repeatedly connects then disconnects:

[10:00:32] Topology snapshot [ver=3D1, servers=3D1, clients= =3D0, CPUs=3D8, heap=3D1.0GB]
[10:01:00] Topology snap= shot [ver=3D3, servers=3D1, clients=3D0, CPUs=3D8, heap=3D1.0GB]
[10:01:41] Topology snapshot [ver=3D7, servers=3D2, clients=3D0,= CPUs=3D10, heap=3D2.0GB]
[10:01:41] Topology snapshot= [ver=3D7, servers=3D1, clients=3D0, CPUs=3D8, heap=3D1.0GB]
[10:02:21] Topology snapshot [ver=3D11, servers=3D2, clients=3D0, CP= Us=3D10, heap=3D2.0GB]
[10:02:21] Topology snapshot [v= er=3D11, servers=3D1, clients=3D0, CPUs=3D8, heap=3D1.0GB]
=
[10:02:42] Topology snapshot [ver=3D13, servers=3D2, clients=3D0, CPUs= =3D10, heap=3D2.0GB]
[10:02:42] Topology snapshot [ver= =3D13, servers=3D1, clients=3D0, CPUs=3D8, heap=3D1.0GB]
[10:06:25] Topology snapshot [ver=3D35, servers=3D2, clients=3D0, CPUs= =3D10, heap=3D2.0GB]
[10:06:25] Topology snapshot [ver= =3D35, servers=3D1, clients=3D0, CPUs=3D8, heap=3D1.0GB]
[10:07:46] Topology snapshot [ver=3D43, servers=3D2, clients=3D0, CPUs= =3D10, heap=3D2.0GB]
[10:07:46] Topology snapshot [ver= =3D43, servers=3D1, clients=3D0, CPUs=3D8, heap=3D1.0GB]

This is from the log (happens every 20 seconds):<= /div>

[10:07:46,035][INFO][disco-event-worker-#46%null%]= [GridDiscoveryManager] Added new node to topology: TcpDiscoveryNode [id=3Da= 5982ff4-a30e-479d-b4c4-d2f18880d100, addrs=3D[0:0:0:0:0:0:0:1%lo, 10.0.2.15= , 127.0.0.1, 192.168.56.102], sockAddrs=3D[/192.168.56.102:47500, /0:0:0:0:0:0:0:1%lo:47500, /10.0.2.15:47500, /10.0.2.15:47500, /127.0.0.1:475= 00, /192.168.56.102:47500],= discPort=3D47500, order=3D42, intOrder=3D22, lastExchangeTime=3D1464080845= 973, loc=3Dfalse, ver=3D1.6.0#20160518-sha1:0b22c45b, isClient=3Dfalse]

[10:07:46,035][INFO][disco-event-worker-#4= 6%null%][GridDiscoveryManager] Topology snapshot [ver=3D43, servers=3D2, cl= ients=3D0, CPUs=3D10, heap=3D2.0GB]

[10= :07:46,036][WARNING][disco-event-worker-#46%null%][GridDiscoveryManager] No= de FAILED: TcpDiscoveryNode [id=3Da5982ff4-a30e-479d-b4c4-d2f18880d100, add= rs=3D[0:0:0:0:0:0:0:1%lo, 10.0.2.15, 127.0.0.1, 192.168.56.102], sockAddrs= =3D[/192.168.56.102:47500, /0:0= :0:0:0:0:0:1%lo:47500, /10.0.2.15:47500<= /a>, /10.0.2.15:47500, /127.0.0.1:47500, /192.168.56.102:47500], discPort=3D47500, order=3D42, intOrder= =3D22, lastExchangeTime=3D1464080845973, loc=3Dfalse, ver=3D1.6.0#20160518-= sha1:0b22c45b, isClient=3Dfalse]

[10:07= :46,036][INFO][disco-event-worker-#46%null%][GridDiscoveryManager] Topology= snapshot [ver=3D43, servers=3D1, clients=3D0, CPUs=3D8, heap=3D1.0GB]

[10:07:46,043][INFO][exchange-worker-#49%nu= ll%][GridCachePartitionExchangeManager] Skipping rebalancing (nothing sched= uled) [top=3DAffinityTopologyVersion [topVer=3D42, minorTopVer=3D0], evt=3D= NODE_JOINED, node=3Da5982ff4-a30e-479d-b4c4-d2f18880d100]
<= br>
[10:07:46,049][INFO][exchange-worker-#49%null%][GridCach= ePartitionExchangeManager] Skipping rebalancing (nothing scheduled) [top=3D= AffinityTopologyVersion [topVer=3D43, minorTopVer=3D0], evt=3DNODE_FAILED, = node=3Da5982ff4-a30e-479d-b4c4-d2f18880d100]

[10:07:56,298][WARNING][tcp-disco-msg-worker-#2%null%][TcpDiscoverySp= i] Timed out waiting for message delivery receipt (most probably, the reaso= n is in long GC pauses on remote node; consider tuning GC and increasing &#= 39;ackTimeout' configuration property). Will retry to send message with= increased timeout. Current timeout: 9760.
Thanks,

Graham


On 23 May 2016= at 16:00, vkulichenko <valentin.kulichenko@gmail.com><= /span> wrote:
Graham,

Default config means that multicast is used for discovery. Can you try
static IP configuration [1] and see if the issue is reproduced?

[1]
https://apacheignite.rea= dme.io/docs/cluster-config#static-ip-based-discovery

-Val



--
View this message in context: http://apache-ignite-users.70518.x6.n= abble.com/Nodes-running-on-different-operating-systems-tp5098p5126.html=
Sent from the Apache Ignite Users m= ailing list archive at Nabble.com.

--047d7b343d6e01d88905339314bf--