zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bianchi <jazzist...@gmail.com>
Subject Re: Zookeeper mesos-master on different network
Date Thu, 14 Apr 2016 14:49:20 GMT
this is the log:

Log file created at: 2016/04/14 14:48:26
Running on machine: master3.novalocal
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
I0414 14:48:26.415572 19956 logging.cpp:188] INFO level logging started!
I0414 14:48:26.416097 19956 main.cpp:230] Build: 2016-03-10 20:32:58 by root
I0414 14:48:26.416121 19956 main.cpp:232] Version: 0.27.2
I0414 14:48:26.416133 19956 main.cpp:235] Git tag: 0.27.2
I0414 14:48:26.416146 19956 main.cpp:239] Git SHA:
3c9ec4a0f34420b7803848af597de00fedefe0e2
I0414 14:48:26.416205 19956 main.cpp:253] Using 'HierarchicalDRF' allocator
I0414 14:48:26.448494 19956 leveldb.cpp:174] Opened db in 32.174282ms
I0414 14:48:26.477005 19956 leveldb.cpp:181] Compacted db in 28.458808ms
I0414 14:48:26.477056 19956 leveldb.cpp:196] Created db iterator in 9749ns
I0414 14:48:26.477097 19956 leveldb.cpp:202] Seeked to beginning of db
in 20828ns
I0414 14:48:26.477110 19956 leveldb.cpp:271] Iterated through 0 keys
in the db in 596ns
I0414 14:48:26.477164 19956 replica.cpp:779] Replica recovered with
log positions 0 -> 0 with 1 holes and 0 unlearned
I0414 14:48:26.478237 19956 main.cpp:464] Starting Mesos master
I0414 14:48:26.479887 19956 master.cpp:374] Master
51d6efb6-7611-4b4e-9118-ff7493889545 (131.154.96.156) started on
192.168.10.11:5050
I0414 14:48:26.480388 19973 log.cpp:236] Attempting to join replica to
ZooKeeper group
I0414 14:48:26.482223 19977 recover.cpp:447] Starting replica recovery
I0414 14:48:26.479909 19956 master.cpp:376] Flags at startup:
--allocation_interval="1secs" --allocator="HierarchicalDRF"
--authenticate="false" --authenticate_http="false"
--authenticate_slaves="false" --authenticators="crammd5"
--authorizers="local" --framework_sorter="drf" --help="false"
--hostname="131.154.96.156" --hostname_lookup="true"
--http_authenticators="basic" --initialize_driver_logging="true"
--log_auto_initialize="true" --log_dir="/var/log/mesos"
--logbufsecs="0" --logging_level="INFO"
--max_completed_frameworks="50"
--max_completed_tasks_per_framework="1000"
--max_slave_ping_timeouts="5" --port="5050" --quiet="false"
--quorum="2" --recovery_slave_removal_limit="100%"
--registry="replicated_log" --registry_fetch_timeout="1mins"
--registry_store_timeout="5secs" --registry_strict="false"
--root_submissions="true" --slave_ping_timeout="15secs"
--slave_reregister_timeout="10mins" --user_sorter="drf"
--version="false" --webui_dir="/usr/share/mesos/webui"
--work_dir="/var/lib/mesos"
--zk="zk://131.154.96.27:2181,131.154.96.32:2181,192.168.10.11:2181/mesos"
--zk_session_timeout="10secs"
I0414 14:48:26.483753 19956 master.cpp:423] Master allowing
unauthenticated frameworks to register
I0414 14:48:26.483772 19956 master.cpp:428] Master allowing
unauthenticated slaves to register
I0414 14:48:26.483789 19956 master.cpp:466] Using default 'crammd5'
authenticator
W0414 14:48:26.483810 19956 authenticator.cpp:511] No credentials
provided, authentication requests will be refused
I0414 14:48:26.484066 19956 authenticator.cpp:518] Initializing server SASL
I0414 14:48:26.495026 19978 recover.cpp:473] Replica is in EMPTY status
I0414 14:48:26.498484 19976 master.cpp:1649] Successfully attached
file '/var/log/mesos/mesos-master.INFO'
I0414 14:48:26.498517 19976 contender.cpp:147] Joining the ZK group
I0414 14:48:26.527865 19972 group.cpp:349] Group process
(group(1)@192.168.10.11:5050) connected to ZooKeeper
I0414 14:48:26.527930 19972 group.cpp:831] Syncing group operations:
queue size (joins, cancels, datas) = (0, 0, 0)
I0414 14:48:26.527954 19972 group.cpp:427] Trying to create path
'/mesos/log_replicas' in ZooKeeper
I0414 14:48:26.528306 19976 group.cpp:349] Group process
(group(4)@192.168.10.11:5050) connected to ZooKeeper
I0414 14:48:26.528364 19976 group.cpp:831] Syncing group operations:
queue size (joins, cancels, datas) = (0, 0, 0)
I0414 14:48:26.528424 19976 group.cpp:427] Trying to create path
'/mesos' in ZooKeeper
I0414 14:48:26.528740 19971 group.cpp:349] Group process
(group(2)@192.168.10.11:5050) connected to ZooKeeper
I0414 14:48:26.528771 19971 group.cpp:831] Syncing group operations:
queue size (joins, cancels, datas) = (1, 0, 0)
I0414 14:48:26.528805 19971 group.cpp:427] Trying to create path
'/mesos/log_replicas' in ZooKeeper
I0414 14:48:26.534221 19972 network.hpp:413] ZooKeeper group memberships changed
I0414 14:48:26.534343 19972 group.cpp:700] Trying to get
'/mesos/log_replicas/0000000054' in ZooKeeper
I0414 14:48:26.534713 19976 detector.cpp:154] Detected a new leader: (id='57')
I0414 14:48:26.534843 19976 group.cpp:700] Trying to get
'/mesos/json.info_0000000057' in ZooKeeper
I0414 14:48:26.536515 19973 group.cpp:349] Group process
(group(3)@192.168.10.11:5050) connected to ZooKeeper
I0414 14:48:26.536546 19973 group.cpp:831] Syncing group operations:
queue size (joins, cancels, datas) = (1, 0, 0)
I0414 14:48:26.536559 19973 group.cpp:427] Trying to create path
'/mesos' in ZooKeeper
I0414 14:48:26.541244 19972 network.hpp:461] ZooKeeper group PIDs: {
log-replica(1)@192.168.100.54:5050 }
I0414 14:48:26.541806 19972 replica.cpp:673] Replica in EMPTY status
received a broadcasted recover request from (5)@192.168.10.11:5050
I0414 14:48:26.541893 19972 recover.cpp:193] Received a recover
response from a replica in EMPTY status
I0414 14:48:26.542330 19976 detector.cpp:479] A new leading master
(UPID=master@192.168.100.54:5050) is detected
I0414 14:48:26.542408 19976 master.cpp:1710] The newly elected leader
is master@192.168.100.54:5050 with id
b6031dea-c621-4ba1-9254-87b7449e0d08
I0414 14:48:26.555027 19976 network.hpp:413] ZooKeeper group memberships changed
I0414 14:48:26.555173 19976 group.cpp:700] Trying to get
'/mesos/log_replicas/0000000054' in ZooKeeper
I0414 14:48:26.556934 19976 group.cpp:700] Trying to get
'/mesos/log_replicas/0000000055' in ZooKeeper
I0414 14:48:26.558343 19976 network.hpp:461] ZooKeeper group PIDs: {
log-replica(1)@192.168.10.11:5050, log-replica(1)@192.168.100.54:5050
}
I0414 14:48:26.562963 19971 contender.cpp:263] New candidate (id='58')
has entered the contest for leadership
I0414 14:48:36.496371 19979 recover.cpp:109] Unable to finish the
recover protocol in 10secs, retrying
I0414 14:48:36.496866 19971 replica.cpp:673] Replica in EMPTY status
received a broadcasted recover request from (10)@192.168.10.11:5050
I0414 14:48:36.496919 19971 recover.cpp:193] Received a recover
response from a replica in EMPTY status
I0414 14:48:36.963434 19971 http.cpp:501] HTTP GET for
/master/state.json from 131.154.5.22:59267 with
User-Agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87
Safari/537.36 OPR/36.0.2130.46'
I0414 14:48:46.497448 19979 recover.cpp:109] Unable to finish the
recover protocol in 10secs, retrying
I0414 14:48:46.498134 19971 replica.cpp:673] Replica in EMPTY status
received a broadcasted recover request from (15)@192.168.10.11:5050
I0414 14:48:46.498247 19971 recover.cpp:193] Received a recover
response from a replica in EMPTY status
I0414 14:48:56.498900 19979 recover.cpp:109] Unable to finish the
recover protocol in 10secs, retrying
I0414 14:48:56.499447 19971 replica.cpp:673] Replica in EMPTY status
received a broadcasted recover request from (17)@192.168.10.11:5050
I0414 14:48:56.499526 19971 recover.cpp:193] Received a recover
response from a replica in EMPTY status


2016-04-14 16:27 GMT+02:00 Stefano Bianchi <jazzista88@gmail.com>:

> However now i perceive a problem with masters.
> If i turn off one master on Network A the the master on network B is
> elected but after a minute it will disconnect, coming back to the original
> one.
>
> 2016-04-14 16:26 GMT+02:00 Stefano Bianchi <jazzista88@gmail.com>:
>
>> on openstack security group the ssh port is open.
>>
>>
>> 2016-04-14 16:24 GMT+02:00 Flavio Junqueira <fpj@apache.org>:
>>
>>> Is it an indication that the SSH port is open and the others aren't?
>>>
>>> -Flavio
>>>
>>> > On 14 Apr 2016, at 15:10, Stefano Bianchi <jazzista88@gmail.com>
>>> wrote:
>>> >
>>> > I tried with telnet and i have connection timed out, but i am able to
>>> > connect trough SSH
>>> >
>>> > 2016-04-14 16:05 GMT+02:00 Stefano Bianchi <jazzista88@gmail.com>:
>>> >
>>> >> Thanks for your reply Flavio.
>>> >> Actually, i don't have a DNS, so i am foced to type hosts file, in
>>> which i
>>> >> have set all the IP addrsses.
>>> >> Of course for the note in Network B i have set the Floating IP of the
>>> >> other 2 slaves in network A associated to their hostname. Actually i
>>> don't
>>> >> know if it is correct, but at least if i make a ping from the slave
in
>>> >> Network B to a slave in A i obtain replies. and vice versa.
>>> >>
>>> >> 2016-04-14 15:55 GMT+02:00 Flavio Junqueira <fpj@apache.org>:
>>> >>
>>> >>> Have you made sure that a slave in net B is able to telnet or ssh
to
>>> the
>>> >>> leader machine in net A? Is it possible that the client port is
>>> blocker
>>> >>> from B to A?
>>> >>>
>>> >>> -Flavio
>>> >>>
>>> >>>
>>> >>>> On 14 Apr 2016, at 14:09, Stefano Bianchi <jazzista88@gmail.com>
>>> wrote:
>>> >>>>
>>> >>>> Hi all
>>> >>>> i'm working on OpenStack and i have build come virtual machines
and
>>> 2
>>> >>>> different networks with it.
>>> >>>> I have set two mesos clusters:
>>> >>>>
>>> >>>> NetworkA:
>>> >>>> 2 mesos master
>>> >>>> 2 mesos slaves
>>> >>>>
>>> >>>> NetworkB:
>>> >>>> 1 mesos master
>>> >>>> 1 mesos slave
>>> >>>>
>>> >>>> I should try to make and interconnection between these two clusters.
>>> >>>>
>>> >>>> I have set zookeeper configurations such that all 3 masters
are
>>> >>> competing
>>> >>>> for he leadership. I show you the main configurations:
>>> >>>>
>>> >>>> NetworkA on both 2 masters:
>>> >>>>
>>> >>>> */etc/zookeeper/conf/zoo.cfg *: at the end of the file
>>> >>>>
>>> >>>> server.1=192.168.100.54:2888:3888 (master1 on network A)
>>> >>>>
>>> >>>> server.2=192.168.100.55:2888:3888 (master2 on network A)
>>> >>>>
>>> >>>> server.3=131.154.xxx.xxx:2888:3888 (Master3 on network B, i
have set
>>> >>>> floating IP)
>>> >>>>
>>> >>>> *etc/mesos/zk*
>>> >>>>
>>> >>>> zk://192.168.100.54:2181,192.168.100.55:2181
>>> ,131.154.xxx.xxx:2181/mesos
>>> >>>>
>>> >>>> NetorkB:
>>> >>>>
>>> >>>> */etc/zookeeper/conf/zoo.cfg: at the end of the file:*
>>> >>>>
>>> >>>> server.1=131.154.96.27:2888:3888 (master1 on network A, i have
set
>>> >>> floating
>>> >>>> IP)
>>> >>>>
>>> >>>> server.2=131.154.96.32:2888:3888 (master2 on network A, i have
set
>>> >>> floating
>>> >>>> IP)
>>> >>>>
>>> >>>> server.3=192.168.10.11:2888:3888 (Master3 on network B)
>>> >>>>
>>> >>>>
>>> >>>> *etc/mesos/zk:*
>>> >>>>
>>> >>>> zk://131.154.zzz.zzz:2181,131.154.yyy.yyy:2181,
>>> 192.168.10.11:2181/mesos
>>> >>>>
>>> >>>>
>>> >>>> the 3 masters seems to work fine, if i stop mesos-master service
on
>>> one
>>> >>> of
>>> >>>> them, there is the rielection, so they are behaving as one single
>>> >>> cluster
>>> >>>> with 3 masters.
>>> >>>> I have no problems with masters, but with slaves.
>>> >>>> I have currenty set up slaves setting the /etc/mesos/zk exactly
as i
>>> >>> shown
>>> >>>> above in a coherent way.
>>> >>>>
>>> >>>> Now the leader s one master which is on the Network A, and only
the
>>> >>> slaves
>>> >>>> on Network A can connect to it, but i need to connect also the
>>> slave on
>>> >>> the
>>> >>>> other network.
>>> >>>> Do you have suggestions?
>>> >>>
>>> >>>
>>> >>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message