hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5936) when cpu strict mode is closed, yarn couldn't assure scheduling fairness between containers
Date Fri, 16 Dec 2016 19:52:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755361#comment-15755361
] 

Miklos Szegedi commented on YARN-5936:
--------------------------------------

Thank you, for the reply [~zhengchenyu]!
{quote}
First,The linux kernel source code of cpu bandwidth will add too many timer, and add more
function to be called. 
Secondly, limit utilization ratio will lead to bad performance.
{quote}
I did an experiment with he following cpu heavy app:
{code}
//a.c
int main() {
  int i;
  int j = 0;
  for (i = 0; i < 1000000000; ++i) {
    j++;
  }
  return j & 1;
}
{code}
I ran it in parallel in a single cgroup, multiple cgroups and multipre cgroups with CPU throttling
enabled on a single CPU.
{code}
for j in `seq 1 10`; do export i=$j;sh -c 'time ./a.out&'; done
for j in `seq 1 10`; do export i=$j;sh -c 'echo $$ >/cgroup/cpu/$i/tasks;echo -1 >/cgroup/cpu/$i/cpu.cfs_quota_us;time
./a.out&'; done
for j in `seq 1 10`; do export i=$j;sh -c 'echo $$ >/cgroup/cpu/$i/tasks;echo 10000 >/cgroup/cpu/$i/cpu.cfs_quota_us;time
./a.out&'; done
{code}
The runtime in the first case (no cgroups) was 24.7154, the second (no group throttle) was
24.6907 seconds on average, the runtime in the latter case was 24.7469 respectively.
The difference less than 0.25% in these cases. I ran it a few more times and I received very
similar numbers.
This means to me that what you are seeing is the utilization drop, if the container group
limits the CPU usage and not an inefficiency in the Linux kernel.

> when cpu strict mode is closed, yarn couldn't assure scheduling fairness between containers
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-5936
>                 URL: https://issues.apache.org/jira/browse/YARN-5936
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.7.1
>         Environment: CentOS7.1
>            Reporter: zhengchenyu
>            Priority: Critical
>             Fix For: 2.7.1
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> When using LinuxContainer, the setting that "yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage"
is true could assure scheduling fairness with the cpu bandwith of cgroup. But the cpu bandwidth
of cgroup would lead to bad performance in our experience. 
>     Without cpu bandwidth of cgroup, cpu.share of cgroup is our only way to assure scheduling
fairness, but it is not completely effective. For example, There are two container that have
same vcore(means same cpu.share), one container is single-threaded, the other container is
multi-thread. the multi-thread will have more CPU time, It's unreasonable!
>     Here is my test case, I submit two distributedshell application. And two commmand
are below:
> {code}
> hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client
-jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar -shell_script ./run.sh
 -shell_args 10 -num_containers 1 -container_memory 1024 -container_vcores 1 -master_memory
1024 -master_vcores 1 -priority 10
> hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client
-jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar -shell_script ./run.sh
 -shell_args 1  -num_containers 1 -container_memory 1024 -container_vcores 1 -master_memory
1024 -master_vcores 1 -priority 10
> {code}
>      here show the cpu time of the two container:
> {code}
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 15448 yarn      20   0 9059592  28336   9180 S 998.7  0.1  24:09.30 java
> 15026 yarn      20   0 9050340  27480   9188 S 100.0  0.1   3:33.97 java
> 13767 yarn      20   0 1799816 381208  18528 S   4.6  1.2   0:30.55 java
>    77 root      rt   0       0      0      0 S   0.3  0.0   0:00.74 migration/1   
> {code}
>     We find the cpu time of Muliti-Thread are ten times than the cpu time of Single-Thread,
though the two container have same cpu.share.
> notes:
> run.sh
> {code} 
> 	java -cp /home/yarn/loop.jar:$CLASSPATH loop.loop $1	
> {code} 
> loop.java
> {code} 
> package loop;
> public class loop {
> 	public static void main(String[] args) {
> 		// TODO Auto-generated method stub
> 		int loop = 1;
> 		if(args.length>=1) {
> 			System.out.println(args[0]);
> 			loop = Integer.parseInt(args[0]);
> 		}
> 		for(int i=0;i<loop;i++){
> 			System.out.println("start thread " + i);
> 			new Thread(new Runnable() {
> 				@Override
> 				public void run() {
> 					// TODO Auto-generated method stub
> 					int j=0;
> 					while(true){j++;}
> 				}
> 			}).start();
> 		}
> 	}
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message