aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mohit Jaggi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AURORA-1952) race condition in offers by agent id map (and potentially others) caused(probably) a crash
Date Fri, 29 Sep 2017 18:52:00 GMT
Mohit Jaggi created AURORA-1952:
-----------------------------------

             Summary: race condition in offers by agent id map (and potentially others) caused(probably)
a crash
                 Key: AURORA-1952
                 URL: https://issues.apache.org/jira/browse/AURORA-1952
             Project: Aurora
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: 0.18.0
         Environment: nothing special
            Reporter: Mohit Jaggi
             Fix For: 0.18.0


Crashed here
https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/preemptor/PendingTaskProcessor.java#L145

due to duplicates in map. Most likely a concurrency issue. [~wfarner] pointed out the following
code:
I'm looking at this chunk here, where a concurrent map would not help.

      Optional<HostOffer> sameSlave = hostOffers.get(offer.getOffer().getAgentId());
      if (sameSlave.isPresent()) {
        // If there are existing offers for the slave, decline all of them so the master can
        // compact all of those offers into a single offer and send them back.
        LOG.info("Returning offers for " + offer.getOffer().getAgentId().getValue()
            + " for compaction.");
        decline(offer.getOffer().getId());
        removeAndDecline(sameSlave.get().getOffer().getId());
      } else {
        hostOffers.add(offer);

--------- logs ----------
Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: Sep 28, 2017 6:09:00 PM com.google.common.util.concurrent.ServiceManager$ServiceListener
failed



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: SEVERE: Service PreemptorService [FAILED]
has failed in the RUNNING state.



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: java.lang.IllegalArgumentException: Multiple
entries with same key: 1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102=HostOffer{offer=id {



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: value: "f7b84805-a0c5-4405-be77-f7f1b7110405-O56597202"



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: }



...

...



ep 28 18:09:00 machine1163 aurora-scheduler[14266]: , hostAttributes=IHostAttributes{host=compute606-dca1.prod.uber.internal,
attributes=[IAttribute{name=host, values=[compute606-dca1]}, IAttribute{name=rack, values=[as13]},
IAttribute{name=pod, values=[d]}, IAttribute{name=dedicated, values=[infra/cassandra]}], mode=NONE,
slaveId=1ed038e0-a3ef-4476-adfd-70c86241c5f7-S102}}. To index multiple values under a key,
use Multimaps.index.



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at com.google.common.collect.Maps.uniqueIndex(Maps.java:1251)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at com.google.common.collect.Maps.uniqueIndex(Maps.java:1208)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.lambda$run$0(PendingTaskProcessor.java:146)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.apache.aurora.scheduler.storage.db.DbStorage.read(DbStorage.java:147)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.mybatis.guice.transactional.TransactionalMethodInterceptor.invoke(TransactionalMethodInterceptor.java:101)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.apache.aurora.scheduler.storage.log.LogStorage.read(LogStorage.java:562)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.apache.aurora.scheduler.storage.CallOrderEnforcingStorage.read(CallOrderEnforcingStorage.java:113)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.apache.aurora.scheduler.preemptor.PendingTaskProcessor.run(PendingTaskProcessor.java:135)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.apache.aurora.common.inject.TimedInterceptor.invoke(TimedInterceptor.java:83)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at org.apache.aurora.scheduler.preemptor.PreemptorModule$PreemptorService.runOneIteration(PreemptorModule.java:161)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at com.google.common.util.concurrent.AbstractScheduledService$ServiceDelegate$Task.run(AbstractScheduledService.java:188)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at com.google.common.util.concurrent.Callables$4.run(Callables.java:122)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: at java.lang.Thread.run(Thread.java:748)



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: E0928 18:09:00.316 [PreemptorService
RUNNING, GuavaUtils$LifecycleShutdownListener:55] Service: PreemptorService [FAILED] failed
unexpectedly. Triggering shutdown.



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 [qtp1000734462-3068369,
Slf4jRequestLog:60] 10.187.28.19 - - [28/Sep/2017:18:09:00 +0000] "POST //10.188.43.6:8082/api
HTTP/1.1" 200 95



Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 [PreemptorService
RUNNING, Lifecycle:84] Shutting down application





Sep 28 18:09:00 machine1163 aurora-scheduler[14266]: I0928 18:09:00.316 [PreemptorService
RUNNING, ShutdownRegistry$ShutdownRegistryImpl:77] Executing 4 shutdown commands.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message