zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From will martin <wmartin...@outlook.com>
Subject Re: Zookeeper mesos-master on different network
Date Thu, 28 Apr 2016 01:50:06 GMT
nice

> On Apr 26, 2016, at 4:59 AM, Stefano Bianchi <jazzista88@gmail.com> wrote:
> 
> I finally found a solution.
> On openstack i designed this topology:
> 
> -----------------------internet-----------------------
>                                |
>                           Router1
>                                |
> --------------------------------------------------------
> |                                                                 |
> Net1                                                        Net2
> Master1 Master2                                     Master3
> Slave1 slave2                                          Slave3
> 
> It is a simplified view, but in this way all the masters and agents are
> reachable through unique hostname, so i am able to set zookeeper uniformly.
> This topology works fine, meaning that if i have leader master on Net1 it
> is able to dispatch a task on Master 3 on Net2.
> 
> 
> 2016-04-14 19:04 GMT+02:00 Stefano Bianchi <jazzista88@gmail.com>:
> 
>> However quorum = 1 does not change anything. I guess that i beed to
>> implement a DNS.
>> Il 14/apr/2016 17:42, "Stefano Bianchi" <jazzista88@gmail.com> ha scritto:
>> 
>>> i don't know why, but setting quorum to 1 on each master i don't have
>>> fluctuating election continuously, i don't know if it could be the right
>>> solution.
>>> I tired to turn off one of the 2 masters on NetworkA, it goes down but
>>> rielection start between the other master on network A and the master on
>>> network B.
>>> Now the only one problem i have is that, if one of the 2 masters on
>>> network A are leading, only slaves on that network are atteched to it.
>>> On the contrary, if the master of network B is leading only the slave on
>>> that network is attached. How can i resolve this ?
>>> I would like for instance that when Master on Network B is leading, all
>>> the 3 slaves, so the one on the same network and 2 on the other network,
>>> are "attached" to that master.
>>> Do you have any suggestion?
>>> 
>>> 2016-04-14 16:49 GMT+02:00 Stefano Bianchi <jazzista88@gmail.com>:
>>> 
>>>> this is the log:
>>>> 
>>>> Log file created at: 2016/04/14 14:48:26
>>>> Running on machine: master3.novalocal
>>>> Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>>>> I0414 14:48:26.415572 19956 logging.cpp:188] INFO level logging started!
>>>> I0414 14:48:26.416097 19956 main.cpp:230] Build: 2016-03-10 20:32:58 by root
>>>> I0414 14:48:26.416121 19956 main.cpp:232] Version: 0.27.2
>>>> I0414 14:48:26.416133 19956 main.cpp:235] Git tag: 0.27.2
>>>> I0414 14:48:26.416146 19956 main.cpp:239] Git SHA: 3c9ec4a0f34420b7803848af597de00fedefe0e2
>>>> I0414 14:48:26.416205 19956 main.cpp:253] Using 'HierarchicalDRF' allocator
>>>> I0414 14:48:26.448494 19956 leveldb.cpp:174] Opened db in 32.174282ms
>>>> I0414 14:48:26.477005 19956 leveldb.cpp:181] Compacted db in 28.458808ms
>>>> I0414 14:48:26.477056 19956 leveldb.cpp:196] Created db iterator in 9749ns
>>>> I0414 14:48:26.477097 19956 leveldb.cpp:202] Seeked to beginning of db in
20828ns
>>>> I0414 14:48:26.477110 19956 leveldb.cpp:271] Iterated through 0 keys in the
db in 596ns
>>>> I0414 14:48:26.477164 19956 replica.cpp:779] Replica recovered with log positions
0 -> 0 with 1 holes and 0 unlearned
>>>> I0414 14:48:26.478237 19956 main.cpp:464] Starting Mesos master
>>>> I0414 14:48:26.479887 19956 master.cpp:374] Master 51d6efb6-7611-4b4e-9118-ff7493889545
(131.154.96.156) started on 192.168.10.11:5050
>>>> I0414 14:48:26.480388 19973 log.cpp:236] Attempting to join replica to ZooKeeper
group
>>>> I0414 14:48:26.482223 19977 recover.cpp:447] Starting replica recovery
>>>> I0414 14:48:26.479909 19956 master.cpp:376] Flags at startup: --allocation_interval="1secs"
--allocator="HierarchicalDRF" --authenticate="false" --authenticate_http="false" --authenticate_slaves="false"
--authenticators="crammd5" --authorizers="local" --framework_sorter="drf" --help="false" --hostname="131.154.96.156"
--hostname_lookup="true" --http_authenticators="basic" --initialize_driver_logging="true"
--log_auto_initialize="true" --log_dir="/var/log/mesos" --logbufsecs="0" --logging_level="INFO"
--max_completed_frameworks="50" --max_completed_tasks_per_framework="1000" --max_slave_ping_timeouts="5"
--port="5050" --quiet="false" --quorum="2" --recovery_slave_removal_limit="100%" --registry="replicated_log"
--registry_fetch_timeout="1mins" --registry_store_timeout="5secs" --registry_strict="false"
--root_submissions="true" --slave_ping_timeout="15secs" --slave_reregister_timeout="10mins"
--user_sorter="drf" --version="false" --webui_dir="/usr/share/mesos/webui" --work_dir="/var/lib/mesos"
--zk="zk://131.154.96.27:2181,131.154.96.32:2181,192.168.10.11:2181/mesos" --zk_session_timeout="10secs"
>>>> I0414 14:48:26.483753 19956 master.cpp:423] Master allowing unauthenticated
frameworks to register
>>>> I0414 14:48:26.483772 19956 master.cpp:428] Master allowing unauthenticated
slaves to register
>>>> I0414 14:48:26.483789 19956 master.cpp:466] Using default 'crammd5' authenticator
>>>> W0414 14:48:26.483810 19956 authenticator.cpp:511] No credentials provided,
authentication requests will be refused
>>>> I0414 14:48:26.484066 19956 authenticator.cpp:518] Initializing server SASL
>>>> I0414 14:48:26.495026 19978 recover.cpp:473] Replica is in EMPTY status
>>>> I0414 14:48:26.498484 19976 master.cpp:1649] Successfully attached file '/var/log/mesos/mesos-master.INFO'
>>>> I0414 14:48:26.498517 19976 contender.cpp:147] Joining the ZK group
>>>> I0414 14:48:26.527865 19972 group.cpp:349] Group process (group(1)@192.168.10.11:5050)
connected to ZooKeeper
>>>> I0414 14:48:26.527930 19972 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
>>>> I0414 14:48:26.527954 19972 group.cpp:427] Trying to create path '/mesos/log_replicas'
in ZooKeeper
>>>> I0414 14:48:26.528306 19976 group.cpp:349] Group process (group(4)@192.168.10.11:5050)
connected to ZooKeeper
>>>> I0414 14:48:26.528364 19976 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
>>>> I0414 14:48:26.528424 19976 group.cpp:427] Trying to create path '/mesos'
in ZooKeeper
>>>> I0414 14:48:26.528740 19971 group.cpp:349] Group process (group(2)@192.168.10.11:5050)
connected to ZooKeeper
>>>> I0414 14:48:26.528771 19971 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (1, 0, 0)
>>>> I0414 14:48:26.528805 19971 group.cpp:427] Trying to create path '/mesos/log_replicas'
in ZooKeeper
>>>> I0414 14:48:26.534221 19972 network.hpp:413] ZooKeeper group memberships
changed
>>>> I0414 14:48:26.534343 19972 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054'
in ZooKeeper
>>>> I0414 14:48:26.534713 19976 detector.cpp:154] Detected a new leader: (id='57')
>>>> I0414 14:48:26.534843 19976 group.cpp:700] Trying to get '/mesos/json.info_0000000057'
in ZooKeeper
>>>> I0414 14:48:26.536515 19973 group.cpp:349] Group process (group(3)@192.168.10.11:5050)
connected to ZooKeeper
>>>> I0414 14:48:26.536546 19973 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (1, 0, 0)
>>>> I0414 14:48:26.536559 19973 group.cpp:427] Trying to create path '/mesos'
in ZooKeeper
>>>> I0414 14:48:26.541244 19972 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.100.54:5050
}
>>>> I0414 14:48:26.541806 19972 replica.cpp:673] Replica in EMPTY status received
a broadcasted recover request from (5)@192.168.10.11:5050
>>>> I0414 14:48:26.541893 19972 recover.cpp:193] Received a recover response
from a replica in EMPTY status
>>>> I0414 14:48:26.542330 19976 detector.cpp:479] A new leading master (UPID=master@192.168.100.54:5050)
is detected
>>>> I0414 14:48:26.542408 19976 master.cpp:1710] The newly elected leader is
master@192.168.100.54:5050 with id b6031dea-c621-4ba1-9254-87b7449e0d08
>>>> I0414 14:48:26.555027 19976 network.hpp:413] ZooKeeper group memberships
changed
>>>> I0414 14:48:26.555173 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000054'
in ZooKeeper
>>>> I0414 14:48:26.556934 19976 group.cpp:700] Trying to get '/mesos/log_replicas/0000000055'
in ZooKeeper
>>>> I0414 14:48:26.558343 19976 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@192.168.10.11:5050,
log-replica(1)@192.168.100.54:5050 }
>>>> I0414 14:48:26.562963 19971 contender.cpp:263] New candidate (id='58') has
entered the contest for leadership
>>>> I0414 14:48:36.496371 19979 recover.cpp:109] Unable to finish the recover
protocol in 10secs, retrying
>>>> I0414 14:48:36.496866 19971 replica.cpp:673] Replica in EMPTY status received
a broadcasted recover request from (10)@192.168.10.11:5050
>>>> I0414 14:48:36.496919 19971 recover.cpp:193] Received a recover response
from a replica in EMPTY status
>>>> I0414 14:48:36.963434 19971 http.cpp:501] HTTP GET for /master/state.json
from 131.154.5.22:59267 with User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36 OPR/36.0.2130.46'
>>>> I0414 14:48:46.497448 19979 recover.cpp:109] Unable to finish the recover
protocol in 10secs, retrying
>>>> I0414 14:48:46.498134 19971 replica.cpp:673] Replica in EMPTY status received
a broadcasted recover request from (15)@192.168.10.11:5050
>>>> I0414 14:48:46.498247 19971 recover.cpp:193] Received a recover response
from a replica in EMPTY status
>>>> I0414 14:48:56.498900 19979 recover.cpp:109] Unable to finish the recover
protocol in 10secs, retrying
>>>> I0414 14:48:56.499447 19971 replica.cpp:673] Replica in EMPTY status received
a broadcasted recover request from (17)@192.168.10.11:5050
>>>> I0414 14:48:56.499526 19971 recover.cpp:193] Received a recover response
from a replica in EMPTY status
>>>> 
>>>> 
>>>> 2016-04-14 16:27 GMT+02:00 Stefano Bianchi <jazzista88@gmail.com>:
>>>> 
>>>>> However now i perceive a problem with masters.
>>>>> If i turn off one master on Network A the the master on network B is
>>>>> elected but after a minute it will disconnect, coming back to the original
>>>>> one.
>>>>> 
>>>>> 2016-04-14 16:26 GMT+02:00 Stefano Bianchi <jazzista88@gmail.com>:
>>>>> 
>>>>>> on openstack security group the ssh port is open.
>>>>>> 
>>>>>> 
>>>>>> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fpj@apache.org>:
>>>>>> 
>>>>>>> Is it an indication that the SSH port is open and the others
aren't?
>>>>>>> 
>>>>>>> -Flavio
>>>>>>> 
>>>>>>>> On 14 Apr 2016, at 15:10, Stefano Bianchi <jazzista88@gmail.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> I tried with telnet and i have connection timed out, but
i am able
>>>>>>> to
>>>>>>>> connect trough SSH
>>>>>>>> 
>>>>>>>> 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <jazzista88@gmail.com>:
>>>>>>>> 
>>>>>>>>> Thanks for your reply Flavio.
>>>>>>>>> Actually, i don't have a DNS, so i am foced to type hosts
file, in
>>>>>>> which i
>>>>>>>>> have set all the IP addrsses.
>>>>>>>>> Of course for the note in Network B i have set the Floating
IP of
>>>>>>> the
>>>>>>>>> other 2 slaves in network A associated to their hostname.
Actually
>>>>>>> i don't
>>>>>>>>> know if it is correct, but at least if i make a ping
from the
>>>>>>> slave in
>>>>>>>>> Network B to a slave in A i obtain replies. and vice
versa.
>>>>>>>>> 
>>>>>>>>> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fpj@apache.org>:
>>>>>>>>> 
>>>>>>>>>> Have you made sure that a slave in net B is able
to telnet or ssh
>>>>>>> to the
>>>>>>>>>> leader machine in net A? Is it possible that the
client port is
>>>>>>> blocker
>>>>>>>>>> from B to A?
>>>>>>>>>> 
>>>>>>>>>> -Flavio
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <jazzista88@gmail.com>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi all
>>>>>>>>>>> i'm working on OpenStack and i have build come
virtual machines
>>>>>>> and 2
>>>>>>>>>>> different networks with it.
>>>>>>>>>>> I have set two mesos clusters:
>>>>>>>>>>> 
>>>>>>>>>>> NetworkA:
>>>>>>>>>>> 2 mesos master
>>>>>>>>>>> 2 mesos slaves
>>>>>>>>>>> 
>>>>>>>>>>> NetworkB:
>>>>>>>>>>> 1 mesos master
>>>>>>>>>>> 1 mesos slave
>>>>>>>>>>> 
>>>>>>>>>>> I should try to make and interconnection between
these two
>>>>>>> clusters.
>>>>>>>>>>> 
>>>>>>>>>>> I have set zookeeper configurations such that
all 3 masters are
>>>>>>>>>> competing
>>>>>>>>>>> for he leadership. I show you the main configurations:
>>>>>>>>>>> 
>>>>>>>>>>> NetworkA on both 2 masters:
>>>>>>>>>>> 
>>>>>>>>>>> */etc/zookeeper/conf/zoo.cfg *: at the end of
the file
>>>>>>>>>>> 
>>>>>>>>>>> server.1=192.168.100.54:2888:3888 (master1 on
network A)
>>>>>>>>>>> 
>>>>>>>>>>> server.2=192.168.100.55:2888:3888 (master2 on
network A)
>>>>>>>>>>> 
>>>>>>>>>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on
network B, i have
>>>>>>> set
>>>>>>>>>>> floating IP)
>>>>>>>>>>> 
>>>>>>>>>>> *etc/mesos/zk*
>>>>>>>>>>> 
>>>>>>>>>>> zk://192.168.100.54:2181,192.168.100.55:2181
>>>>>>> ,131.154.xxx.xxx:2181/mesos
>>>>>>>>>>> 
>>>>>>>>>>> NetorkB:
>>>>>>>>>>> 
>>>>>>>>>>> */etc/zookeeper/conf/zoo.cfg: at the end of the
file:*
>>>>>>>>>>> 
>>>>>>>>>>> server.1=131.154.96.27:2888:3888 (master1 on
network A, i have
>>>>>>> set
>>>>>>>>>> floating
>>>>>>>>>>> IP)
>>>>>>>>>>> 
>>>>>>>>>>> server.2=131.154.96.32:2888:3888 (master2 on
network A, i have
>>>>>>> set
>>>>>>>>>> floating
>>>>>>>>>>> IP)
>>>>>>>>>>> 
>>>>>>>>>>> server.3=192.168.10.11:2888:3888 (Master3 on
network B)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> *etc/mesos/zk:*
>>>>>>>>>>> 
>>>>>>>>>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>>>>>> 192.168.10.11:2181/mesos
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> the 3 masters seems to work fine, if i stop mesos-master
service
>>>>>>> on one
>>>>>>>>>> of
>>>>>>>>>>> them, there is the rielection, so they are behaving
as one single
>>>>>>>>>> cluster
>>>>>>>>>>> with 3 masters.
>>>>>>>>>>> I have no problems with masters, but with slaves.
>>>>>>>>>>> I have currenty set up slaves setting the /etc/mesos/zk
exactly
>>>>>>> as i
>>>>>>>>>> shown
>>>>>>>>>>> above in a coherent way.
>>>>>>>>>>> 
>>>>>>>>>>> Now the leader s one master which is on the Network
A, and only
>>>>>>> the
>>>>>>>>>> slaves
>>>>>>>>>>> on Network A can connect to it, but i need to
connect also the
>>>>>>> slave on
>>>>>>>>>> the
>>>>>>>>>>> other network.
>>>>>>>>>>> Do you have suggestions?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 


Mime
View raw message