[ https://issues.apache.org/jira/browse/YARN-5731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated YARN-5731: ----------------------------- Attachment: YARN-5731.branch-2.8.004.patch Attached patch to branch-2.8, since branch-2.8 doesn't include the main fix, so I consolidated main fix and addendum fix to the same patch. (004) > Preemption calculation is not accurate when reserved containers are present in queue. > ------------------------------------------------------------------------------------- > > Key: YARN-5731 > URL: https://issues.apache.org/jira/browse/YARN-5731 > Project: Hadoop YARN > Issue Type: Bug > Components: capacity scheduler > Affects Versions: 2.8.0 > Reporter: Sunil G > Assignee: Wangda Tan > Fix For: 2.9.0, 3.0.0-beta1 > > Attachments: YARN-5731.001.patch, YARN-5731.002.patch, YARN-5731.addendum.003.patch, YARN-5731.addendum.004.patch, YARN-5731.branch-2.002.patch, YARN-5731-branch-2.8.001.patch, YARN-5731-branch-2.8.001.patch, YARN-5731.branch-2.8.004.patch > > > YARN Capacity Scheduler does not kick Preemption under below scenario. > Two queues A and B each with 50% capacity and 100% maximum capacity and user limit factor 2. Minimum Container size is 1536MB and total cluster resource is 40GB. Now submit the first job which needs 1536MB for AM and 9 task containers each 4.5GB to queue A. Job will get 8 containers total (AM 1536MB + 7 * 4.5GB = 33GB) and the cluster usage is 93.8% and the job has reserved a container of 4.5GB. > Now when next job (1536MB for AM and 9 task containers each 4.5GB) is submitted onto queue B. The job hangs in ACCEPTED state forever and RM scheduler never kicks in Preemption. (RM UI Image 2 attached) > Test Case: > ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --queue A --executor-memory 4G --executor-cores 4 --num-executors 9 ../lib/spark-examples*.jar 1000000 > After a minute.. > ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client --queue B --executor-memory 4G --executor-cores 4 --num-executors 9 ../lib/spark-examples*.jar 1000000 > Credit to: [~Prabhu Joseph] for bug investigation and troubleshooting. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org