yunikorn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (Jira)" <j...@apache.org>
Subject [jira] [Created] (YUNIKORN-176) schedulerCache might become inconsistent sometimes depending on the ordering of the events
Date Fri, 22 May 2020 04:51:00 GMT
Weiwei Yang created YUNIKORN-176:
------------------------------------

             Summary: schedulerCache might become inconsistent sometimes depending on the
ordering of the events
                 Key: YUNIKORN-176
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-176
             Project: Apache YuniKorn
          Issue Type: Bug
          Components: shim - kubernetes
            Reporter: Weiwei Yang
            Assignee: Weiwei Yang


Sometimes, we found some nodes are stuck at pending when working with the auto-scaler. Because
some daemon set pods were pending to schedule.

The root cause is: 
 # auto-scaler scales up a node
 # the daemon set controller creates pod for e.g fluentd (it sets the pod.spec.nodeName="newly-added-host")
 # YK got informed from pod informer: add pod
 # add pod to cache (schedulerCache), since the {{pod.spec.nodeName}} is not nil, it adds
a {{new nodeInfo}}
 # node informer got informed: add node
 # add node to scheduler cache, the node already exists, skip calling SetNode
 # scheduler tries to allocate the pod to the node
 # predicates failed: NodeUnknownCondition (node x doesn't exist in schedulerCache)
 # the allocation always fail and pod pending..
 # since the daemon set pod could not be started, node status will be NotReady



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org


Mime
View raw message