spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [spark] vanzin opened a new pull request #26586: [SPARK-29950][k8s] Blacklist deleted executors in K8S with dynamic allocation.
Date Tue, 19 Nov 2019 00:28:30 GMT
vanzin opened a new pull request #26586: [SPARK-29950][k8s] Blacklist deleted executors in
K8S with dynamic allocation.
URL: https://github.com/apache/spark/pull/26586
 
 
   The issue here is that when Spark is downscaling the application and deletes
   a few pod requests that aren't needed anymore, it may actually race with the
   K8S scheduler, who may be bringing up those executors. So they may have enough
   time to connect back to the driver, register, to just be deleted soon after.
   This wastes resources and causes misleading entries in the driver log.
   
   The change (ab)uses the blacklisting mechanism to consider the deleted excess
   pods as blacklisted, so that if they try to connect back, the driver will deny
   it.
   
   It also changes the executor registration slightly, since even with the above
   change there were misleading logs. That was because the executor registration
   message was an RPC that always succeeded (bar network issues), so the executor
   would always try to send an unregistration message to the driver, which would
   then log several messages about not knowing anything about the executor. The
   change makes the registration RPC succeed or fail directly, instead of using
   the separate failure message that would lead to this issue.
   
   Note the last change required some changes in a standalone test suite related
   to dynamic allocation, since it relied on the driver not throwing exceptions
   when a duplicate executor registration happened.
   
   Tested with existing unit tests, and with live cluster with dyn alloc on.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message