aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Erb (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AURORA-1786) -zk_session_timeout option does not work
Date Wed, 01 Feb 2017 07:54:54 GMT

     [ https://issues.apache.org/jira/browse/AURORA-1786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stephan Erb updated AURORA-1786:
--------------------------------
    Fix Version/s: 0.17.0

> -zk_session_timeout option does not work
> ----------------------------------------
>
>                 Key: AURORA-1786
>                 URL: https://issues.apache.org/jira/browse/AURORA-1786
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: David Robinson
>             Fix For: 0.17.0
>
>
> Looks like the -zk_session_timeout option has no affect. I've set -zk_session_timeout="60mins"
to attempt to work around ZK session timeouts (due to GC pauses caused by TaskHistoryPruner
pruning a huge number of inactive tasks), but the default 30 seconds seems to always be used.
> {noformat}
> I0929 22:36:10.804 [main, ArgScanner:411] zk_chroot_path: null 
> I0929 22:36:10.804 [main, ArgScanner:411] zk_digest_credentials: xxxx:xxxx 
> I0929 22:36:10.805 [main, ArgScanner:411] zk_endpoints: [zk.example.com:2181] 
> I0929 22:36:10.805 [main, ArgScanner:411] zk_in_proc: false 
> I0929 22:36:10.805 [main, ArgScanner:411] zk_session_timeout: (30, mins) 
> I0929 22:36:10.805 [main, ArgScanner:411] zk_use_curator: true 
> {noformat}
> {noformat}
> I0929 22:48:37.678 [AsyncProcessor-3, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:37.738 [AsyncProcessor-5, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> 2016-09-29 22:48:37,794:47040(0x7f07f4c3c940):ZOO_WARN@zookeeper_interest@1570: Exceeded
deadline by 12ms
> I0929 22:48:37.805 [AsyncProcessor-0, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:37.814 [AsyncProcessor-6, MemTaskStore:148] Query took 588 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:48:37.867 [AsyncProcessor-1, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:37.873 [AsyncProcessor-2, MemTaskStore:148] Query took 304 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:48:37.875 [AsyncProcessor-7, MemTaskStore:148] Query took 289 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:48:37.886 [AsyncProcessor-4, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:38.045 [AsyncProcessor-3, MemTaskStore:148] Query took 359 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:48:38.152 [AsyncProcessor-5, MemTaskStore:148] Query took 405 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:48:38.407 [AsyncProcessor-0, MemTaskStore:148] Query took 594 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:48:38.442 [AsyncProcessor-1, MemTaskStore:148] Query took 566 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:48:38.445 [AsyncProcessor-4, MemTaskStore:148] Query took 550 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:48:38.460 [AsyncProcessor-7, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:48:38.468 [AsyncProcessor-2, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> 2016-09-29 22:48:51,141:47040(0x7f07f4c3c940):ZOO_WARN@zookeeper_interest@1570: Exceeded
deadline by 13ms
> I0929 22:49:01.002467 47173 process.cpp:3323] Handling HTTP event for process 'metrics'
with path: '/metrics/snapshot'
> I0929 22:48:38.483 [AsyncProcessor-6, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> W0929 22:49:07.165 [main-SendThread(smf1-alj-03-sr1.prod.twitter.com:2181), ClientCnxn$SendThread:1108]
Client session timed out, have not heard from server in 36019ms for sessionid 0x576f9386901ce3

> W0929 22:49:07.168 [qtp382517336-72, LeaderRedirect:194] No serviceGroupMonitor in host
set, will not redirect despite not being leader. 
> I0929 22:49:07.170 [qtp382517336-72, Slf4jRequestLog:60] 127.0.0.1 - - [29/Sep/2016:22:49:07
+0000] "GET //localhost:8081/quotas HTTP/1.1" 503 1561  
> I0929 22:49:07.171 [AsyncProcessor-7, MemTaskStore:148] Query took 28701 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:49:07.171 [AsyncProcessor-2, MemTaskStore:148] Query took 28693 ms: ITaskQuery{role=null,
environment=null, jobName=null, taskIds=[], statuses=[FINISHED, FAILED, KILLED, LOST], instanceIds=[],
slaveHosts=[], jobKeys=[IJobKey{role=mesos, environment=test, name=healthy-daemon-19}], offset=0,
limit=0} 
> I0929 22:49:07.171 [qtp382517336-52, Slf4jRequestLog:60] 127.0.0.1 - - [29/Sep/2016:22:49:07
+0000] "GET //localhost:8081/vars.json?filtered=1 HTTP/1.1" 200 34679  
> I0929 22:49:07.172 [main-SendThread(smf1-alj-03-sr1.prod.twitter.com:2181), ClientCnxn$SendThread:1156]
Client session timed out, have not heard from server in 36019ms for sessionid 0x576f9386901ce3,
closing socket connection and attempting reconnect 
> I0929 22:49:07.179 [AsyncProcessor-0, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:49:07.179 [AsyncProcessor-5, TaskHistoryPruner:137] Pruning inactive tasks [mesos-test-healthy-daemon-19-3588-e2d79602-e354-4dc0-bfaa-b16d32e2b09d,
mesos-test-healthy-daemon-19-1551-b4b7e52f-f468-44ba-a1a9-ad3c95b602a3, mesos-test-healthy-daemon-19-4105-ff87bef1-af09-4201-9cc2-863c8ece3621,
mesos-test-healthy-daemon-19-7416-66de9261-5fe5-47c4-be37-3dd5
> I0929 22:49:07.273 [main-EventThread, ConnectionStateManager:228] State change: SUSPENDED

> E0929 22:49:07.345 [Curator-ConnectionStateManager-0, SchedulerLifecycle$SchedulerCandidateImpl:395]
Lost leadership, committing suicide. 
> I0929 22:49:07.359 [Curator-ConnectionStateManager-0, StateMachine$Builder:389] SchedulerLifecycle
state machine transition LEADER_AWAITING_REGISTRATION -> DEAD
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message