yunikorn-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [incubator-yunikorn-core] wilfred-s commented on issue #89: Core allocation/reservation logic renovation
Date Tue, 18 Feb 2020 04:13:11 GMT
wilfred-s commented on issue #89: Core allocation/reservation logic renovation
URL: https://github.com/apache/incubator-yunikorn-core/pull/89#issuecomment-587269592
 
 
   New commits pushed with the smoke tests and further clean up. To link this back to the
comment from @yangwwei:
   comment 1: (https://github.com/apache/incubator-yunikorn-core/pull/89#issuecomment-584913754)
remarks:
   1.  internal unreserve: fixed the issue found (commit 1)
   1.  tryAllocate correctly unreserves (commit 2)
   1. if reserved allocation fails try all nodes (commit 1)
   
   comment 2 (https://github.com/apache/incubator-yunikorn-core/pull/89#issuecomment-585361646)
remarks
   1. running predicates for each allocation try for each  node in all cycles will cause a
huge slow down. For example in a 100 node cluster 1 predicates check is run if there are enough
resources available on the node. If we do it before that check and let it lead us and we have
99 nodes that do not fit the ask we would have run the predicates 100 times for the same alloc.
Caching the predicate run is not possible as node usage can change and thus the predicate
outcome would change. I think that is 1) is thus a no go.
   1. the score used is really basic at the moment. However I could argue for or against all
scores. A large node might have a longer average runtime per allocation (service type load)
and thus release less often. Without metrics we really cannot argue for one or the other or
for a 3rd alternative.
   1. Yes we need better metrics, I will follow up with a new jira
   
   For the test failures: I have seen a number of them and they are transient. The tests use
a manual scheduler (steps based on a counter). The manual scheduling in the smoke tests is
I think the cause of the issue. The duration of the scheduling cycle is short and also cut
even shorter when nothing needs to be done. We probably _waste_ scheduling cycles because
we have nothing to do. When events are later processed we have no scheduling cycles to progress.
   I am thinking about a better solution or even using continuous scheduling in the smoke
tests.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@yunikorn.apache.org
For additional commands, e-mail: dev-help@yunikorn.apache.org


Mime
View raw message