aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zameer Manji (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1652) /maintenance endpoint can result in 500
Date Tue, 29 Mar 2016 18:18:25 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216543#comment-15216543
] 

Zameer Manji commented on AURORA-1652:
--------------------------------------

I've been taking a look and I don't think it is a regression introduced by that patch. I think
this method is suspect:

{noformat}
  private Multimap<String, String> getTasksByHosts(StoreProvider provider, Iterable<String>
hosts) {
    ImmutableSet.Builder<IScheduledTask> drainingTasks = ImmutableSet.builder();
    drainingTasks.addAll(provider.getTaskStore().fetchTasks(Query.slaveScoped(hosts).active()));
    return Multimaps.transformValues(
        Multimaps.index(drainingTasks.build(), Tasks::scheduledToSlaveHost),
        Tasks::id);
  }
{noformat}

We fetch active tasks, which are tasks in the following states:
{noformat}
    ACTIVE_STATES.add(org.apache.aurora.gen.ScheduleStatus.ASSIGNED);
    ACTIVE_STATES.add(org.apache.aurora.gen.ScheduleStatus.DRAINING);
    ACTIVE_STATES.add(org.apache.aurora.gen.ScheduleStatus.KILLING);
    ACTIVE_STATES.add(org.apache.aurora.gen.ScheduleStatus.PENDING);
    ACTIVE_STATES.add(org.apache.aurora.gen.ScheduleStatus.PREEMPTING);
    ACTIVE_STATES.add(org.apache.aurora.gen.ScheduleStatus.RESTARTING);
    ACTIVE_STATES.add(org.apache.aurora.gen.ScheduleStatus.RUNNING);
    ACTIVE_STATES.add(org.apache.aurora.gen.ScheduleStatus.STARTING);
    ACTIVE_STATES.add(org.apache.aurora.gen.ScheduleStatus.THROTTLED);
{noformat}

This includes PENDING tasks (like the task in the exception message I posted) but then we
index the map on the key function {{scheduledToSlaveHost}}. PENDING tasks don't have hosts
and this would result in a null key in the multimap which makes sense with the stack trace
pointing to this indexing operation.

I think instead we need to scope the query to tasks in states that have an associated slave.

> /maintenance endpoint can result in 500
> ---------------------------------------
>
>                 Key: AURORA-1652
>                 URL: https://issues.apache.org/jira/browse/AURORA-1652
>             Project: Aurora
>          Issue Type: Story
>            Reporter: Zameer Manji
>
> On a build from master one can get this exception when checking the /maintenance endpoint:
> {noformat}
> W0329 17:44:03.181 [qtp790040713-114601, ServletHandler:631] /maintenance java.lang.NullPointerException:
null key in entry: null=IScheduledTask{assignedTask=IAssignedTask{taskId=m
> khutornenko-test-hello_service_4-241-b21561f9-1432-4028-be5a-95b163477469, slaveId=null,
slaveHost=null, task=ITaskConfig{job=IJobKey{role=mkhutornenko, environment=test, name=hell
> o_service_4}, owner=IIdentity{user=mkhutornenko}, isService=true, numCpus=0.1, ramMb=16,
diskMb=16, priority=0, maxTaskFailures=1, production=false, tier=null, constraints=[], requ
> estedPorts=[], taskLinks={}, contactEmail=null, executorConfig=IExecutorConfig{name=AuroraExecutor,
data={"environment": "test", "health_check_config": {"initial_interval_secs": 15
> .0, "health_checker": {"http": {"expected_response_code": 0, "endpoint": "/health", "expected_response":
"ok"}}, "interval_secs": 10.0, "timeout_secs": 1.0, "max_consecutive_failur
> es": 0}, "name": "hello_service_4", "service": true, "max_task_failures": 1, "cron_collision_policy":
"KILL_EXISTING", "enable_hooks": false, "cluster": "smf1-test", "task": {"proc
> esses": [{"daemon": false, "name": "hello", "ephemeral": false, "max_failures": 1, "min_duration":
5, "cmdline": "\n    while true; do\n      echo hello world\n      sleep 10\n
> done\n  ", "final": false}], "name": "hello", "finalization_wait": 30, "max_failures":
1, "max_concurrency": 0, "resources": {"disk": 16777216, "ram": 16777216, "cpu": 0.1}, "const
> raints": []}, "production": false, "role": "mkhutornenko", "lifecycle": {"http": {"graceful_shutdown_endpoint":
"/quitquitquit", "port": "health", "shutdown_endpoint": "/abortabort
> abort"}}, "priority": 0}}, metadata=[], container=IContainer{setField=MESOS, value=IMesosContainer{}}},
assignedPorts={}, instanceId=241}, status=PENDING, failureCount=0, taskEvent
> s=[ITaskEvent{timestamp=1458929745345, status=PENDING, message=null, scheduler=smf1-but-34-sr2.prod.twitter.com}],
ancestorId=null}
>         at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:31)
~[guava-19.0.jar:na]
>         at com.google.common.collect.ImmutableMultimap$Builder.put(ImmutableMultimap.java:167)
~[guava-19.0.jar:na]
>         at com.google.common.collect.ImmutableListMultimap$Builder.put(ImmutableListMultimap.java:149)
~[guava-19.0.jar:na]
>         at com.google.common.collect.Multimaps.index(Multimaps.java:1549) ~[guava-19.0.jar:na]
>         at com.google.common.collect.Multimaps.index(Multimaps.java:1496) ~[guava-19.0.jar:na]
>         at org.apache.aurora.scheduler.http.Maintenance.getTasksByHosts(Maintenance.java:79)
~[aurora-156.jar:na]
>         at org.apache.aurora.scheduler.http.Maintenance.lambda$getHosts$0(Maintenance.java:70)
~[aurora-156.jar:na]
>         at org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:152)
~[aurora-156.jar:na]
>         at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
~[mybatis-guice-3.7.jar:3.7]
>         at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)
~[commons-156.jar:na]
>         at org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:561)
~[aurora-156.jar:na]
>         at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113)
~[aurora-156.jar:na]
>         at org.apache.aurora.scheduler.http.Maintenance.getHosts(Maintenance.java:59)
~[aurora-156.jar:na]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_72-Tw8r10b0]
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
~[na:1.8.0_72-Tw8r10b0]
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.8.0_72-Tw8r10b0]
>         at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_72-Tw8r10b0]
>         at com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)
~[j
> ersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
~[jersey-server-1.19.jar:1.19]
>         at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)
~[jersey-servlet-1.19.jar:1.19]
>         at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)
~[jersey-servlet-1.19.jar:1.19]
>         at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:927)
~[jersey-servlet-1.19.jar:1.19]
>         at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:875)
~[jersey-servlet-1.19.jar:1.19]
>         at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:829)
~[jersey-servlet-1.19.jar:1.19]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at org.apache.aurora.scheduler.http.LeaderRedirectFilter.doFilter(LeaderRedirectFilter.java:72)
~[aurora-156.jar:na]
>         at org.apache.aurora.scheduler.http.AbstractFilter.doFilter(AbstractFilter.java:44)
~[aurora-156.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at org.apache.aurora.scheduler.http.HttpStatsFilter.doFilter(HttpStatsFilter.java:71)
~[aurora-156.jar:na]
>         at org.apache.aurora.scheduler.http.AbstractFilter.doFilter(AbstractFilter.java:44)
~[aurora-156.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:168)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
~[guice-servlet-3.0.jar:na]
>         at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) ~[guice-servlet-3.0.jar:na]
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
~[jetty-servlet-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
[jetty-servlet-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1158)
[jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
[jetty-servlet-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1090)
[jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
[jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
[jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:318)
[jetty-rewrite-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:437)
[jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
[jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.Server.handle(Server.java:517) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308) [jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
[jetty-server-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:261)
[jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) [jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
[jetty-io-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
[jetty-util-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
[jetty-util-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
[jetty-util-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
[jetty-util-9.3.6.v20151106.jar:9.3.6.v20151106]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_72-Tw8r10b0]
> I0329 17:44:03.181 [qtp790040713-114601, Slf4jRequestLog:60] 10.53.156.101 - - [29/Mar/2016:17:43:47
+0000] "GET //aurora-smf1-test.twitter.biz/maintenance HTTP/1.1" 500 2735
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message