mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (MESOS-712) invalid zhandle state
Date Tue, 01 Oct 2013 00:41:23 GMT

     [ https://issues.apache.org/jira/browse/MESOS-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kone resolved MESOS-712.
------------------------------

    Resolution: Duplicate

> invalid zhandle state
> ---------------------
>
>                 Key: MESOS-712
>                 URL: https://issues.apache.org/jira/browse/MESOS-712
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.14.0
>            Reporter: David Robinson
>
> {noformat:title=log snippet}
> 2013-09-29 08:58:30,445:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461: Exceeded
deadline by 16533ms
> 2013-09-29 08:58:30,445:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1528:
Socket [192.168.0.1:2181] zk retcode=-7, errno=110(Connection timed out): connection timed
out (exceeded timeout by 13199ms)
> I0929 08:58:17.544836 45283 cgroups.cpp:1193] Trying to freeze cgroup /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
> 2013-09-29 08:58:30,474:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1141: Calling a
watcher for a ZOO_SESSION_EVENT and the state=CONNECTING_STATE
> 2013-09-29 08:58:30,475:45279(0x7f9024e3f940):ZOO_WARN@zookeeper_interest@1461: Exceeded
deadline by 16564ms
> 2013-09-29 08:58:30,475:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: Calling
a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
> I0929 08:58:30.445508 45282 detector.cpp:251] Trying to create path '/home/mesos/prod/master'
in ZooKeeper
> 2013-09-29 08:58:30,483:45279(0x7f9024e3f940):ZOO_INFO@check_events@1585: initiated connection
to server [192.168.0.2:2181]
> 2013-09-29 08:58:30,488:45279(0x7f9031267940):ZOO_DEBUG@zoo_awexists@2587: Sending request
xid=0x5244d598 for path [/home/mesos/prod/master] to 192.168.0.2:2181
> 2013-09-29 08:58:30,488:45279(0x7f9024e3f940):ZOO_ERROR@handle_socket_error_msg@1621:
Socket [192.168.0.2:2181] zk retcode=-112, errno=116(Stale NFS file handle): sessionId=0x340523200364932
has expired.
> 2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@handle_error@1138: Calling a
watcher for a ZOO_SESSION_EVENT and the state=ZOO_EXPIRED_SESSION_STATE
> 2013-09-29 08:58:30,489:45279(0x7f9024e3f940):ZOO_DEBUG@do_io@317: IO thread terminated
> 2013-09-29 08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1765: Calling
a watcher for node [], type = -1 event=ZOO_SESSION_EVENT
> 2013-09-29 08:58:30,489:45279(0x7f901ffff940):ZOO_DEBUG@process_completions@1784: Calling
COMPLETION_STAT for xid=0x5244d598 rc=-112
> I0929 08:58:30.475751 45283 cgroups.cpp:1232] Successfully froze cgroup /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
after 1 attempts
> F0929 08:58:30.492090 45282 detector.cpp:266] Failed to create '/home/mesos/prod/master'
in ZooKeeper: invalid zhandle state
> *** Check failure stack trace: ***
> I0929 08:58:30.492761 45292 cgroups.cpp:1208] Trying to thaw cgroup /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
> I0929 08:58:31.144810 45291 cgroups_isolator.cpp:937] Executor thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f
of framework 201205082337-0000000003-0000 terminated with status 9
> I0929 08:58:32.791193 45292 cgroups.cpp:1318] Successfully thawed /cgroup/mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
> I0929 08:58:33.675348 45298 cgroups_isolator.cpp:1275] Successfully destroyed cgroup
mesos/framework_201205082337-0000000003-0000_executor_thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f_tag_8edc5ce9-20bc-4b09-bc92-d9bab7769738
> I0929 08:58:33.676269 45300 slave.cpp:2158] Executor 'thermos-1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f'
of framework 201205082337-0000000003-0000 has terminated with signal Killed
> I0929 08:58:33.678154 45300 slave.cpp:1778] Handling status update TASK_FAILED (UUID:
4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f
of framework 201205082337-0000000003-0000 from @0.0.0.0:0
> I0929 08:58:33.679175 45288 cgroups_isolator.cpp:700] Asked to update resources for an
unknown/killed executor
> I0929 08:58:33.679201 45300 status_update_manager.cpp:300] Received status update TASK_FAILED
(UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f
of framework 201205082337-0000000003-0000 
> I0929 08:58:33.680452 45300 status_update_manager.hpp:337] Checkpointing UPDATE for status
update TASK_FAILED (UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f
of framework 201205082337-0000000003-0000 
>     @     0x7f9035fb562d  google::LogMessage::Fail()
>     @     0x7f9035fb9617  google::LogMessage::SendToLog()
>     @     0x7f9035fb7f14  google::LogMessage::Flush()
> I0929 08:58:35.929435 45300 status_update_manager.cpp:351] Forwarding status update TASK_FAILED
(UUID: 4d90de5a-cdad-4bb8-ab93-7c4f185a0d24) for task 1380442146400-test_master-0-f947deee-f813-47fa-8bd3-d0f06ece941f
of framework 201205082337-0000000003-0000 to master@10.42.69.138:5050
>     @     0x7f9035fb8146  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f9035d1a83f  mesos::internal::ZooKeeperMasterDetectorProcess::connected()
>     @     0x7f9035d1f118  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7f9035d21b84  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7f9035ea6f84  process::ProcessManager::resume()
>     @     0x7f9035ea79df  process::schedule()
>     @     0x7f903561083d  start_thread
>     @     0x7f9033ff2f8d  clone
> {noformat}
> slave exited w/ SIGABRT. Zookeeper connection issue? Should Mesos handle this gracefully?



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message