aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zameer Manji (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1016) NullPointerException in PreemptorImpl
Date Wed, 14 Jan 2015 23:32:34 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277887#comment-14277887
] 

Zameer Manji commented on AURORA-1016:
--------------------------------------

Closer inspection leads to suspicious code in {{CachedClusterState.java}}.
{noformat}
  private final Multimap<String, PreemptionVictim> victims =
      Multimaps.synchronizedMultimap(HashMultimap.<String, PreemptionVictim>create());

  @Override
  public Multimap<String, PreemptionVictim> getSlavesToActiveTasks() {
    return Multimaps.unmodifiableMultimap(victims);
  }

  @Subscribe
  public void taskChangedState(TaskStateChange stateChange) {
    synchronized (victims) {
      String slaveId = stateChange.getTask().getAssignedTask().getSlaveId();
      PreemptionVictim victim = PreemptionVictim.fromTask(stateChange.getTask().getAssignedTask());
      if (Tasks.SLAVE_ASSIGNED_STATES.contains(stateChange.getNewState())) {
        victims.put(slaveId, victim);
      } else {
        victims.remove(slaveId, victim);
      }
    }
  }
{noformat}

In the above code the {{HashMultimap}} can have null keys which can propagate to the preemptor
code. This can trigger the observed NPE. In order to have null keys it is possible we have
tasks in the set of states {{SLAVE_ASSIGNED_STATES}} and not have a slave.

I think the first step in fixing this bug will be to enforce that invariant where possible.

> NullPointerException in PreemptorImpl
> -------------------------------------
>
>                 Key: AURORA-1016
>                 URL: https://issues.apache.org/jira/browse/AURORA-1016
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Zameer Manji
>            Assignee: Zameer Manji
>              Labels: twitter
>
> This appears in the logs of a scheduler that appears to not be preempting tasks.
> {noformat}
> W0114 20:57:59.565 THREAD149 org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl.schedule:
Task scheduling unexpectedly
>  failed, will be retried
> java.lang.NullPointerException
>         at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:213)
>         at com.google.common.collect.ImmutableCollection$ArrayBasedBuilder.add(ImmutableCollection.java:339)
>         at com.google.common.collect.ImmutableSet$Builder.add(ImmutableSet.java:480)
>         at com.google.common.collect.ImmutableSet$Builder.add(ImmutableSet.java:456)
>         at com.google.common.collect.ImmutableCollection$Builder.addAll(ImmutableCollection.java:282)
>         at com.google.common.collect.ImmutableCollection$ArrayBasedBuilder.addAll(ImmutableCollection.java:360)
>         at com.google.common.collect.ImmutableSet$Builder.addAll(ImmutableSet.java:508)
>         at org.apache.aurora.scheduler.async.preemptor.PreemptorImpl.findPreemptionSlotFor(PreemptorImpl.java:321)
>         at org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl.maybePreemptFor(TaskScheduler.java:249)
>         at org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl.scheduleTask(TaskScheduler.java:220)
>         at com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
>         at org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl$3.apply(TaskScheduler.java:192)
>         at org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl$3.apply(TaskScheduler.java:189)
>         at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:608)
>         at org.apache.aurora.scheduler.storage.log.LogStorage$24.apply(LogStorage.java:605)
>         at org.apache.aurora.scheduler.storage.mem.MemStorage$3.apply(MemStorage.java:147)
>         at org.apache.aurora.scheduler.storage.mem.MemStorage$3.apply(MemStorage.java:144)
>         at org.apache.aurora.scheduler.storage.db.DbStorage.write(DbStorage.java:137)
>         at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)
>         at org.apache.aurora.scheduler.storage.mem.MemStorage.write(MemStorage.java:144)
>         at com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
>         at org.apache.aurora.scheduler.storage.log.LogStorage.doInTransaction(LogStorage.java:605)
>         at org.apache.aurora.scheduler.storage.log.LogStorage.write(LogStorage.java:638)
>         at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.write(CallOrderEnforcingStorage.java:122)
>         at org.apache.aurora.scheduler.async.TaskScheduler$TaskSchedulerImpl.schedule(TaskScheduler.java:189)
>         at com.twitter.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:87)
>         at org.apache.aurora.scheduler.async.TaskGroups$1.schedule(TaskGroups.java:136)
>         at org.apache.aurora.scheduler.async.TaskGroups$2.run(TaskGroups.java:158)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message