Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C4CD19196 for ; Thu, 14 Apr 2016 17:04:54 +0000 (UTC) Received: (qmail 21504 invoked by uid 500); 14 Apr 2016 17:04:53 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 21455 invoked by uid 500); 14 Apr 2016 17:04:53 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 21443 invoked by uid 99); 14 Apr 2016 17:04:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Apr 2016 17:04:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id BED6418044B for ; Thu, 14 Apr 2016 17:04:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.431 X-Spam-Level: * X-Spam-Status: No, score=1.431 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id acGBjpIs7_MP for ; Thu, 14 Apr 2016 17:04:49 +0000 (UTC) Received: from mail-lf0-f49.google.com (mail-lf0-f49.google.com [209.85.215.49]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id C25215FAD8 for ; Thu, 14 Apr 2016 17:04:48 +0000 (UTC) Received: by mail-lf0-f49.google.com with SMTP id j11so116325302lfb.1 for ; Thu, 14 Apr 2016 10:04:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to; bh=82WDRwW35dSI69rOBqyOnlL1FvJdKL2ewRinYHdgOSo=; b=zxVKg32QMvopZQf7mWxF5JtHO/LC4KEkXBRxExGjlWcR4rswmmgteW68+V8CCN2F8c 9Y337Va92V0+nbwwP81L6wwLDgX/agVZcCR2BvMQcQZKSrAWv89Z7wTKxo9av3GpqgWW p2x8lNfkneZ9bolcUs2LcPQt0it8pIHHgRV7klHruNxgiTi7P719pTOxfHwgNHZV4rh8 ILSbgtnQf7xzHXict80ogkb0Ql25iRvXemcWEQbd4Gzyt6XzG0gDdIb5LeHrWBAteLlL Z6RpMcHjKzaLh44OTeTPYAeZ2FMCJ+hdOfwSTXXPyCmtmpwUaZZ8qWN+er8gq720Yxm6 kwEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to; bh=82WDRwW35dSI69rOBqyOnlL1FvJdKL2ewRinYHdgOSo=; b=BtxRdJLx9GRX6WQnPSRPOUPUu8xe5AdTknIFDbSm81GGpMj/XEN5F7/Fbffs2ozFaX It9qVGItb/ehlfmp5eZcSeR3fNZiMSZC+8wTsJBL5F2K8IGB6t59JgiGnf7XKDVXIA8z jvXnpcTpRdDGDBwltRioEnKkSoA2rI9ei998uLJnu5yo8iM3KhMqRMNYERIRfgyx0Azr o2y9Z4LgcY5aTE7yIy68HAiKu+9/8FOOKq3efURuhr2kRx+mLOEh3gHnEyjAbsbR4PD0 isfMxbh6re1M/ahmYmZSqY9FIvNXQDoztnYuXLmgF+MzV8m6j2jaHC3MuaX1csMfUZ3n RRkQ== X-Gm-Message-State: AOPr4FVSchFmpv18RwS18Xl3UCQYewSPDkikcWNb7yherO8HDEPRlEmSkzM0AVgCBHgQFmc+r3rAeay6ji0TTg== MIME-Version: 1.0 X-Received: by 10.112.97.162 with SMTP id eb2mr7070419lbb.132.1460653481064; Thu, 14 Apr 2016 10:04:41 -0700 (PDT) Received: by 10.112.144.102 with HTTP; Thu, 14 Apr 2016 10:04:40 -0700 (PDT) Received: by 10.112.144.102 with HTTP; Thu, 14 Apr 2016 10:04:40 -0700 (PDT) In-Reply-To: References: <70043199-F2C8-4245-9E74-C894B1657E93@apache.org> Date: Thu, 14 Apr 2016 19:04:40 +0200 Message-ID: Subject: Re: Zookeeper mesos-master on different network From: Stefano Bianchi To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=001a1133b56ce1042d053074e32b --001a1133b56ce1042d053074e32b Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable However quorum =3D 1 does not change anything. I guess that i beed to implement a DNS. Il 14/apr/2016 17:42, "Stefano Bianchi" ha scritto: > i don't know why, but setting quorum to 1 on each master i don't have > fluctuating election continuously, i don't know if it could be the right > solution. > I tired to turn off one of the 2 masters on NetworkA, it goes down but > rielection start between the other master on network A and the master on > network B. > Now the only one problem i have is that, if one of the 2 masters on > network A are leading, only slaves on that network are atteched to it. > On the contrary, if the master of network B is leading only the slave on > that network is attached. How can i resolve this ? > I would like for instance that when Master on Network B is leading, all > the 3 slaves, so the one on the same network and 2 on the other network, > are "attached" to that master. > Do you have any suggestion? > > 2016-04-14 16:49 GMT+02:00 Stefano Bianchi : > >> this is the log: >> >> Log file created at: 2016/04/14 14:48:26 >> Running on machine: master3.novalocal >> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg >> I0414 14:48:26.415572 19956 logging.cpp:188] INFO level logging started! >> I0414 14:48:26.416097 19956 main.cpp:230] Build: 2016-03-10 20:32:58 by = root >> I0414 14:48:26.416121 19956 main.cpp:232] Version: 0.27.2 >> I0414 14:48:26.416133 19956 main.cpp:235] Git tag: 0.27.2 >> I0414 14:48:26.416146 19956 main.cpp:239] Git SHA: 3c9ec4a0f34420b780384= 8af597de00fedefe0e2 >> I0414 14:48:26.416205 19956 main.cpp:253] Using 'HierarchicalDRF' alloca= tor >> I0414 14:48:26.448494 19956 leveldb.cpp:174] Opened db in 32.174282ms >> I0414 14:48:26.477005 19956 leveldb.cpp:181] Compacted db in 28.458808ms >> I0414 14:48:26.477056 19956 leveldb.cpp:196] Created db iterator in 9749= ns >> I0414 14:48:26.477097 19956 leveldb.cpp:202] Seeked to beginning of db i= n 20828ns >> I0414 14:48:26.477110 19956 leveldb.cpp:271] Iterated through 0 keys in = the db in 596ns >> I0414 14:48:26.477164 19956 replica.cpp:779] Replica recovered with log = positions 0 -> 0 with 1 holes and 0 unlearned >> I0414 14:48:26.478237 19956 main.cpp:464] Starting Mesos master >> I0414 14:48:26.479887 19956 master.cpp:374] Master 51d6efb6-7611-4b4e-91= 18-ff7493889545 (131.154.96.156) started on 192.168.10.11:5050 >> I0414 14:48:26.480388 19973 log.cpp:236] Attempting to join replica to Z= ooKeeper group >> I0414 14:48:26.482223 19977 recover.cpp:447] Starting replica recovery >> I0414 14:48:26.479909 19956 master.cpp:376] Flags at startup: --allocati= on_interval=3D"1secs" --allocator=3D"HierarchicalDRF" --authenticate=3D"fal= se" --authenticate_http=3D"false" --authenticate_slaves=3D"false" --authent= icators=3D"crammd5" --authorizers=3D"local" --framework_sorter=3D"drf" --he= lp=3D"false" --hostname=3D"131.154.96.156" --hostname_lookup=3D"true" --htt= p_authenticators=3D"basic" --initialize_driver_logging=3D"true" --log_auto_= initialize=3D"true" --log_dir=3D"/var/log/mesos" --logbufsecs=3D"0" --loggi= ng_level=3D"INFO" --max_completed_frameworks=3D"50" --max_completed_tasks_p= er_framework=3D"1000" --max_slave_ping_timeouts=3D"5" --port=3D"5050" --qui= et=3D"false" --quorum=3D"2" --recovery_slave_removal_limit=3D"100%" --regis= try=3D"replicated_log" --registry_fetch_timeout=3D"1mins" --registry_store_= timeout=3D"5secs" --registry_strict=3D"false" --root_submissions=3D"true" -= -slave_ping_timeout=3D"15secs" --slave_reregister_timeout=3D"10mins" --user= _sorter=3D"drf" --version=3D"false" --webui_dir=3D"/usr/share/mesos/webui" = --work_dir=3D"/var/lib/mesos" --zk=3D"zk://131.154.96.27:2181,131.154.96.32= :2181,192.168.10.11:2181/mesos" --zk_session_timeout=3D"10secs" >> I0414 14:48:26.483753 19956 master.cpp:423] Master allowing unauthentica= ted frameworks to register >> I0414 14:48:26.483772 19956 master.cpp:428] Master allowing unauthentica= ted slaves to register >> I0414 14:48:26.483789 19956 master.cpp:466] Using default 'crammd5' auth= enticator >> W0414 14:48:26.483810 19956 authenticator.cpp:511] No credentials provid= ed, authentication requests will be refused >> I0414 14:48:26.484066 19956 authenticator.cpp:518] Initializing server S= ASL >> I0414 14:48:26.495026 19978 recover.cpp:473] Replica is in EMPTY status >> I0414 14:48:26.498484 19976 master.cpp:1649] Successfully attached file = '/var/log/mesos/mesos-master.INFO' >> I0414 14:48:26.498517 19976 contender.cpp:147] Joining the ZK group >> I0414 14:48:26.527865 19972 group.cpp:349] Group process (group(1)@192.1= 68.10.11:5050) connected to ZooKeeper >> I0414 14:48:26.527930 19972 group.cpp:831] Syncing group operations: que= ue size (joins, cancels, datas) =3D (0, 0, 0) >> I0414 14:48:26.527954 19972 group.cpp:427] Trying to create path '/mesos= /log_replicas' in ZooKeeper >> I0414 14:48:26.528306 19976 group.cpp:349] Group process (group(4)@192.1= 68.10.11:5050) connected to ZooKeeper >> I0414 14:48:26.528364 19976 group.cpp:831] Syncing group operations: que= ue size (joins, cancels, datas) =3D (0, 0, 0) >> I0414 14:48:26.528424 19976 group.cpp:427] Trying to create path '/mesos= ' in ZooKeeper >> I0414 14:48:26.528740 19971 group.cpp:349] Group process (group(2)@192.1= 68.10.11:5050) connected to ZooKeeper >> I0414 14:48:26.528771 19971 group.cpp:831] Syncing group operations: que= ue size (joins, cancels, datas) =3D (1, 0, 0) >> I0414 14:48:26.528805 19971 group.cpp:427] Trying to create path '/mesos= /log_replicas' in ZooKeeper >> I0414 14:48:26.534221 19972 network.hpp:413] ZooKeeper group memberships= changed >> I0414 14:48:26.534343 19972 group.cpp:700] Trying to get '/mesos/log_rep= licas/0000000054' in ZooKeeper >> I0414 14:48:26.534713 19976 detector.cpp:154] Detected a new leader: (id= =3D'57') >> I0414 14:48:26.534843 19976 group.cpp:700] Trying to get '/mesos/json.in= fo_0000000057' in ZooKeeper >> I0414 14:48:26.536515 19973 group.cpp:349] Group process (group(3)@192.1= 68.10.11:5050) connected to ZooKeeper >> I0414 14:48:26.536546 19973 group.cpp:831] Syncing group operations: que= ue size (joins, cancels, datas) =3D (1, 0, 0) >> I0414 14:48:26.536559 19973 group.cpp:427] Trying to create path '/mesos= ' in ZooKeeper >> I0414 14:48:26.541244 19972 network.hpp:461] ZooKeeper group PIDs: { log= -replica(1)@192.168.100.54:5050 } >> I0414 14:48:26.541806 19972 replica.cpp:673] Replica in EMPTY status rec= eived a broadcasted recover request from (5)@192.168.10.11:5050 >> I0414 14:48:26.541893 19972 recover.cpp:193] Received a recover response= from a replica in EMPTY status >> I0414 14:48:26.542330 19976 detector.cpp:479] A new leading master (UPID= =3Dmaster@192.168.100.54:5050) is detected >> I0414 14:48:26.542408 19976 master.cpp:1710] The newly elected leader is= master@192.168.100.54:5050 with id b6031dea-c621-4ba1-9254-87b7449e0d08 >> I0414 14:48:26.555027 19976 network.hpp:413] ZooKeeper group memberships= changed >> I0414 14:48:26.555173 19976 group.cpp:700] Trying to get '/mesos/log_rep= licas/0000000054' in ZooKeeper >> I0414 14:48:26.556934 19976 group.cpp:700] Trying to get '/mesos/log_rep= licas/0000000055' in ZooKeeper >> I0414 14:48:26.558343 19976 network.hpp:461] ZooKeeper group PIDs: { log= -replica(1)@192.168.10.11:5050, log-replica(1)@192.168.100.54:5050 } >> I0414 14:48:26.562963 19971 contender.cpp:263] New candidate (id=3D'58')= has entered the contest for leadership >> I0414 14:48:36.496371 19979 recover.cpp:109] Unable to finish the recove= r protocol in 10secs, retrying >> I0414 14:48:36.496866 19971 replica.cpp:673] Replica in EMPTY status rec= eived a broadcasted recover request from (10)@192.168.10.11:5050 >> I0414 14:48:36.496919 19971 recover.cpp:193] Received a recover response= from a replica in EMPTY status >> I0414 14:48:36.963434 19971 http.cpp:501] HTTP GET for /master/state.jso= n from 131.154.5.22:59267 with User-Agent=3D'Mozilla/5.0 (Macintosh; Intel = Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.8= 7 Safari/537.36 OPR/36.0.2130.46' >> I0414 14:48:46.497448 19979 recover.cpp:109] Unable to finish the recove= r protocol in 10secs, retrying >> I0414 14:48:46.498134 19971 replica.cpp:673] Replica in EMPTY status rec= eived a broadcasted recover request from (15)@192.168.10.11:5050 >> I0414 14:48:46.498247 19971 recover.cpp:193] Received a recover response= from a replica in EMPTY status >> I0414 14:48:56.498900 19979 recover.cpp:109] Unable to finish the recove= r protocol in 10secs, retrying >> I0414 14:48:56.499447 19971 replica.cpp:673] Replica in EMPTY status rec= eived a broadcasted recover request from (17)@192.168.10.11:5050 >> I0414 14:48:56.499526 19971 recover.cpp:193] Received a recover response= from a replica in EMPTY status >> >> >> 2016-04-14 16:27 GMT+02:00 Stefano Bianchi : >> >>> However now i perceive a problem with masters. >>> If i turn off one master on Network A the the master on network B is >>> elected but after a minute it will disconnect, coming back to the origi= nal >>> one. >>> >>> 2016-04-14 16:26 GMT+02:00 Stefano Bianchi : >>> >>>> on openstack security group the ssh port is open. >>>> >>>> >>>> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira : >>>> >>>>> Is it an indication that the SSH port is open and the others aren't? >>>>> >>>>> -Flavio >>>>> >>>>> > On 14 Apr 2016, at 15:10, Stefano Bianchi >>>>> wrote: >>>>> > >>>>> > I tried with telnet and i have connection timed out, but i am able = to >>>>> > connect trough SSH >>>>> > >>>>> > 2016-04-14 16:05 GMT+02:00 Stefano Bianchi : >>>>> > >>>>> >> Thanks for your reply Flavio. >>>>> >> Actually, i don't have a DNS, so i am foced to type hosts file, in >>>>> which i >>>>> >> have set all the IP addrsses. >>>>> >> Of course for the note in Network B i have set the Floating IP of >>>>> the >>>>> >> other 2 slaves in network A associated to their hostname. Actually >>>>> i don't >>>>> >> know if it is correct, but at least if i make a ping from the slav= e >>>>> in >>>>> >> Network B to a slave in A i obtain replies. and vice versa. >>>>> >> >>>>> >> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira : >>>>> >> >>>>> >>> Have you made sure that a slave in net B is able to telnet or ssh >>>>> to the >>>>> >>> leader machine in net A? Is it possible that the client port is >>>>> blocker >>>>> >>> from B to A? >>>>> >>> >>>>> >>> -Flavio >>>>> >>> >>>>> >>> >>>>> >>>> On 14 Apr 2016, at 14:09, Stefano Bianchi >>>>> wrote: >>>>> >>>> >>>>> >>>> Hi all >>>>> >>>> i'm working on OpenStack and i have build come virtual machines >>>>> and 2 >>>>> >>>> different networks with it. >>>>> >>>> I have set two mesos clusters: >>>>> >>>> >>>>> >>>> NetworkA: >>>>> >>>> 2 mesos master >>>>> >>>> 2 mesos slaves >>>>> >>>> >>>>> >>>> NetworkB: >>>>> >>>> 1 mesos master >>>>> >>>> 1 mesos slave >>>>> >>>> >>>>> >>>> I should try to make and interconnection between these two >>>>> clusters. >>>>> >>>> >>>>> >>>> I have set zookeeper configurations such that all 3 masters are >>>>> >>> competing >>>>> >>>> for he leadership. I show you the main configurations: >>>>> >>>> >>>>> >>>> NetworkA on both 2 masters: >>>>> >>>> >>>>> >>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file >>>>> >>>> >>>>> >>>> server.1=3D192.168.100.54:2888:3888 (master1 on network A) >>>>> >>>> >>>>> >>>> server.2=3D192.168.100.55:2888:3888 (master2 on network A) >>>>> >>>> >>>>> >>>> server.3=3D131.154.xxx.xxx:2888:3888 (Master3 on network B, i ha= ve >>>>> set >>>>> >>>> floating IP) >>>>> >>>> >>>>> >>>> *etc/mesos/zk* >>>>> >>>> >>>>> >>>> zk://192.168.100.54:2181,192.168.100.55:2181 >>>>> ,131.154.xxx.xxx:2181/mesos >>>>> >>>> >>>>> >>>> NetorkB: >>>>> >>>> >>>>> >>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:* >>>>> >>>> >>>>> >>>> server.1=3D131.154.96.27:2888:3888 (master1 on network A, i have >>>>> set >>>>> >>> floating >>>>> >>>> IP) >>>>> >>>> >>>>> >>>> server.2=3D131.154.96.32:2888:3888 (master2 on network A, i have >>>>> set >>>>> >>> floating >>>>> >>>> IP) >>>>> >>>> >>>>> >>>> server.3=3D192.168.10.11:2888:3888 (Master3 on network B) >>>>> >>>> >>>>> >>>> >>>>> >>>> *etc/mesos/zk:* >>>>> >>>> >>>>> >>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181, >>>>> 192.168.10.11:2181/mesos >>>>> >>>> >>>>> >>>> >>>>> >>>> the 3 masters seems to work fine, if i stop mesos-master service >>>>> on one >>>>> >>> of >>>>> >>>> them, there is the rielection, so they are behaving as one singl= e >>>>> >>> cluster >>>>> >>>> with 3 masters. >>>>> >>>> I have no problems with masters, but with slaves. >>>>> >>>> I have currenty set up slaves setting the /etc/mesos/zk exactly >>>>> as i >>>>> >>> shown >>>>> >>>> above in a coherent way. >>>>> >>>> >>>>> >>>> Now the leader s one master which is on the Network A, and only >>>>> the >>>>> >>> slaves >>>>> >>>> on Network A can connect to it, but i need to connect also the >>>>> slave on >>>>> >>> the >>>>> >>>> other network. >>>>> >>>> Do you have suggestions? >>>>> >>> >>>>> >>> >>>>> >> >>>>> >>>>> >>>> >>> >> > --001a1133b56ce1042d053074e32b--