hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anfernee Xu <anfernee...@gmail.com>
Subject Re: In Yarn how to increase the number of concurrent applications for a queue
Date Wed, 10 Sep 2014 00:03:05 GMT
Sure, I can open a jira, but how can I do it? I went to

https://issues.apache.org/jira/browse/YARN/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel

But I did not see any link can lead me to open a new jira? do I miss
something?

BTW, I found another interesting issue, as all our jobs are uberized and we
have 2 queues(default and small), all jobs for default queue are fine, but
jobs on small queue ran slowly compared to default queue, the major
difference is the time spent in job commit, as you can see from below log,
the user logic was finished at 05:28:06,984,
and then it kept going for 21 seconds, and at 05:28:27,036 the job was
allowed to commit, whereas on default queue, it only takes less than 1
second for this.

Do you have any idea about what can cause this? Is it due to the restricted
resource(small queue only has 10 nodes whereas default has 100 nodes).


2014-09-09 05:28:06,984 INFO [job-thread-8283272023] Job is Done
2014-09-09 05:28:06,985 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:06,987 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1410195300700_18702_m_000000_0 is : 0.0
2014-09-09 05:28:07,004 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.Task: Task:attempt_1410195300700_18702_m_000000_0
is done. And is in the process of committing
2014-09-09 05:28:07,028 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit-pending state
update from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:07,029 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1410195300700_18702_m_000000_0 TaskAttempt Transitioned from
RUNNING to COMMIT_PENDING
2014-09-09 05:28:07,029 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:07,029 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
attempt_1410195300700_18702_m_000000_0 given a go for committing the task
output.
2014-09-09 05:28:08,029 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:09,030 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:09,968 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:09,968 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1410195300700_18702_m_000000_0 is : 1.0
2014-09-09 05:28:10,030 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:11,030 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:12,031 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:12,986 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:12,986 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1410195300700_18702_m_000000_0 is : 1.0
2014-09-09 05:28:13,031 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:14,031 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:15,032 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:16,001 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:16,002 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1410195300700_18702_m_000000_0 is : 1.0
2014-09-09 05:28:16,032 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:17,033 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:18,033 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:19,019 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:19,019 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1410195300700_18702_m_000000_0 is : 1.0
2014-09-09 05:28:19,034 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:20,034 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:21,034 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:22,034 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:22,034 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1410195300700_18702_m_000000_0 is : 1.0
2014-09-09 05:28:22,034 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:23,035 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:24,035 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:25,035 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:25,049 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:25,049 INFO [communication thread]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1410195300700_18702_m_000000_0 is : 1.0
2014-09-09 05:28:26,035 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:27,036 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Commit go/no-go request
from attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:27,036 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Result of canCommit
for attempt_1410195300700_18702_m_000000_0:true
2014-09-09 05:28:27,036 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.Task: Task attempt_1410195300700_18702_m_000000_0
is allowed to commit now
2014-09-09 05:28:27,088 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Saved output of
task 'attempt_1410195300700_18702_m_000000_0' to hdfs://
slc02knk.us.oracle.com:55310/tmp/thirdeye/Publish-28305282698003/_temporary/1/task_1410195300700_18702_m_000000
2014-09-09 05:28:27,104 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Status update from
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:27,104 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt
attempt_1410195300700_18702_m_000000_0 is : 1.0
2014-09-09 05:28:27,105 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.TaskAttemptListenerImpl: Done acknowledgement from
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:27,105 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.Task: Task
'attempt_1410195300700_18702_m_000000_0' done.
2014-09-09 05:28:27,107 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1410195300700_18702_m_000000_0 TaskAttempt Transitioned from
COMMIT_PENDING to SUCCESS_CONTAINER_CLEANUP
2014-09-09 05:28:27,107 INFO [uber-SubtaskRunner]
org.apache.hadoop.mapred.LocalContainerLauncher: Processing the event
EventType: CONTAINER_REMOTE_CLEANUP for container
container_1410195300700_18702_01_000001 taskAttempt
attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:27,111 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl:
attempt_1410195300700_18702_m_000000_0 TaskAttempt Transitioned from
SUCCESS_CONTAINER_CLEANUP to SUCCEEDED
2014-09-09 05:28:27,124 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: Task succeeded with
attempt attempt_1410195300700_18702_m_000000_0
2014-09-09 05:28:27,126 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1410195300700_18702_m_000000 Task Transitioned from RUNNING to
SUCCEEDED


On Tue, Sep 9, 2014 at 10:58 AM, Arun Murthy <acm@hortonworks.com> wrote:

> Thanks for digging into this. Mind opening a jira to discuss further? Much
> appreciated.
>
> Arun
>
> On Mon, Sep 8, 2014 at 7:15 PM, Anfernee Xu <anfernee.xu@gmail.com> wrote:
>
>> It turned out that it's not a configuration issue, some worker thread
>> which submits job to Yarn was blocked, see below thread dump
>>
>> "pool-1-thread-160" id=194 idx=0x30c tid=886 prio=5 alive, blocked,
>> native_blocked
>>     -- Blocked trying to get lock:
>> org/apache/hadoop/ipc/Client$Connection@0x1059d0c60[thin lock]
>>     at __lll_lock_wait+36(:0)@0x340260d594
>>     at tsSleep+399(threadsystem.c:83)@0x2b2356e5da80
>>     at jrockit/vm/Threads.sleep(I)V(Native Method)
>>     at jrockit/vm/Locks.waitForThinRelease(Locks.java:955)[optimized]
>>     at
>> jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1083)[optimized]
>>     at
>> jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized]
>>     at
>> org/apache/hadoop/ipc/Client$Connection.addCall(Client.java:400)[inlined]
>>     at
>> org/apache/hadoop/ipc/Client$Connection.access$2500(Client.java:314)[inlined]
>>     at
>> org/apache/hadoop/ipc/Client.getConnection(Client.java:1393)[optimized]
>>     at org/apache/hadoop/ipc/Client.call(Client.java:1318)[inlined]
>>     at org/apache/hadoop/ipc/Client.call(Client.java:1300)[inlined]
>>     at
>> org/apache/hadoop/ipc/ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)[optimized]
>>     at
>> $Proxy21.getJobReport(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/mapreduce/v2/proto/MRServiceProtos$GetJobReportRequestProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRServiceProtos$GetJobReportResponseProto;(Unknown
>> Source)
>>     at
>> org/apache/hadoop/mapreduce/v2/api/impl/pb/client/MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)[optimized]
>>     at
>> sun/reflect/GeneratedMethodAccessor79.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Unknown
>> Source)[optimized]
>>     at
>> sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)[optimized]
>>     at java/lang/reflect/Method.invoke(Method.java:597)[inlined]
>>     at
>> org/apache/hadoop/mapred/ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)[inlined]
>>     at
>> org/apache/hadoop/mapred/ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:416)[optimized]
>>     ^-- Holding lock:
>> org/apache/hadoop/mapred/ClientServiceDelegate@0x10087d788[biased lock]
>>     at
>> org/apache/hadoop/mapred/YarnRunner.getJobStatus(TIEYarnRunner.java:522)[optimized]
>>     at org/apache/hadoop/mapreduce/Job$1.run(Job.java:314)[inlined]
>>     at org/apache/hadoop/mapreduce/Job$1.run(Job.java:311)[inlined]
>>     at
>> jrockit/vm/AccessController.doPrivileged(AccessController.java:254)[inlined]
>>     at javax/security/auth/Subject.doAs(Subject.java:396)[inlined]
>>     at
>> org/apache/hadoop/security/UserGroupInformation.doAs(UserGroupInformation.java:1491)[inlined]
>>     at
>> org/apache/hadoop/mapreduce/Job.updateStatus(Job.java:311)[optimized]
>>     ^-- Holding lock: org/apache/hadoop/mapreduce/Job@0x100522fb8[biased
>> lock]
>>     at org/apache/hadoop/mapreduce/Job.isComplete(Job.java:599)
>>     at org/apache/hadoop/mapreduce/Job.waitForCompletion(Job.java:1294)
>>
>> The lock was held by
>>
>> "pool-1-thread-10" id=44 idx=0xb4 tid=736 prio=5 alive, sleeping,
>> native_waiting
>>     at pthread_cond_timedwait@@GLIBC_2.3.2+288(:0)@0x340260b1c0
>>     at eventTimedWaitNoTransitionImpl+46(event.c:93)@0x2b2356cc741f
>>     at
>> syncWaitForSignalNoTransition+133(synchronization.c:51)@0x2b2356e5a096
>>     at syncWaitForSignal+189(synchronization.c:85)@0x2b2356e5a1ae
>>     at vmtSleep+165(signaling.c:197)@0x2b2356e35ef6
>>     at JVM_Sleep+188(jvmthreads.c:119)@0x2b2356d6bb7d
>>     at java/lang/Thread.sleep(J)V(Native Method)
>>     at
>> org/apache/hadoop/ipc/Client$Connection.handleConnectionFailure(Client.java:778)[optimized]
>>     at
>> org/apache/hadoop/ipc/Client$Connection.setupConnection(Client.java:566)[optimized]
>>     ^-- Holding lock: org/apache/hadoop/ipc/Client$Connection@0x1059d0c60
>> [recursive]
>>     at
>> org/apache/hadoop/ipc/Client$Connection.setupIOstreams(Client.java:642)[optimized]
>>     ^-- Holding lock: org/apache/hadoop/ipc/Client$Connection@0x1059d0c60[thin
>> lock]
>>     at
>> org/apache/hadoop/ipc/Client$Connection.access$2600(Client.java:314)[inlined]
>>     at
>> org/apache/hadoop/ipc/Client.getConnection(Client.java:1399)[optimized]
>>     at org/apache/hadoop/ipc/Client.call(Client.java:1318)[inlined]
>>     at org/apache/hadoop/ipc/Client.call(Client.java:1300)[inlined]
>>     at
>> org/apache/hadoop/ipc/ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)[optimized]
>>     at
>> $Proxy21.getJobReport(Lcom/google/protobuf/RpcController;Lorg/apache/hadoop/mapreduce/v2/proto/MRServiceProtos$GetJobReportRequestProto;)Lorg/apache/hadoop/mapreduce/v2/proto/MRServiceProtos$GetJobReportResponseProto;(Unknown
>> Source)
>>     at
>> org/apache/hadoop/mapreduce/v2/api/impl/pb/client/MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:133)[optimized]
>>     at
>> sun/reflect/GeneratedMethodAccessor79.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;(Unknown
>> Source)[optimized]
>>     at
>> sun/reflect/DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)[optimized]
>>     at java/lang/reflect/Method.invoke(Method.java:597)[inlined]
>>     at
>> org/apache/hadoop/mapred/ClientServiceDelegate.invoke(ClientServiceDelegate.java:317)[inlined]
>>     at
>> org/apache/hadoop/mapred/ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:416)[optimized]
>>     ^-- Holding lock:
>> org/apache/hadoop/mapred/ClientServiceDelegate@0x1012c34f8[biased lock]
>>     at
>> org/apache/hadoop/mapred/YarnRunner.getJobStatus(TIEYarnRunner.java:522)[optimized]
>>     at org/apache/hadoop/mapreduce/Job$1.run(Job.java:314)[inlined]
>>     at org/apache/hadoop/mapreduce/Job$1.run(Job.java:311)[inlined]
>>     at
>> jrockit/vm/AccessController.doPrivileged(AccessController.java:254)[inlined]
>>     at javax/security/auth/Subject.doAs(Subject.java:396)[inlined]
>>     at
>> org/apache/hadoop/security/UserGroupInformation.doAs(UserGroupInformation.java:1491)[inlined]
>>     at
>> org/apache/hadoop/mapreduce/Job.updateStatus(Job.java:311)[optimized]
>>     ^-- Holding lock: org/apache/hadoop/mapreduce/Job@0x1016e05a8[biased
>> lock]
>>     at org/apache/hadoop/mapreduce/Job.isComplete(Job.java:599)
>>     at org/apache/hadoop/mapreduce/Job.waitForCompletion(Job.java:1294)
>>
>> You can see the thead holding the lock is in sleep state and the calling
>> method is Connection.handleConnectionFailure(), so I checked the our log
>> file and realized the connection failure is about historyserver is not
>> available. In my case, I did not start historyserver at all, because it's
>> not needed(I disabled log-aggregation), so my question is why the job
>> client was still trying to talk to historyserver even log aggregation is
>> disabled.
>>
>> Thanks
>>
>>
>>
>> On Mon, Sep 8, 2014 at 3:57 AM, Arun Murthy <acm@hortonworks.com> wrote:
>>
>>> How many nodes do you have in your cluster?
>>>
>>> Also, could you share the CapacityScheduler initialization logs for each
>>> queue, such as:
>>>
>>> 2014-08-14 15:14:23,835 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
>>> Initialized queue: unfunded: capacity=0.5, absoluteCapacity=0.5,
>>> usedResources=<memory:0, vCores:0>, usedCapacity=0.0,
>>> absoluteUsedCapacity=0.0, numApps=0, numContainers=0
>>> 2014-08-14 15:14:23,840 INFO
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Initializing default
>>> capacity = 0.5 [= (float) configuredCapacity / 100 ]
>>> asboluteCapacity = 0.5 [= parentAbsoluteCapacity * capacity ]
>>> maxCapacity = 1.0 [= configuredMaxCapacity ]
>>> absoluteMaxCapacity = 1.0 [= 1.0 maximumCapacity undefined,
>>> (parentAbsoluteMaxCapacity * maximumCapacity) / 100 otherwise ]
>>> userLimit = 100 [= configuredUserLimit ]
>>> userLimitFactor = 1.0 [= configuredUserLimitFactor ]
>>> maxApplications = 5000 [= configuredMaximumSystemApplicationsPerQueue or
>>> (int)(configuredMaximumSystemApplications * absoluteCapacity)]
>>> maxApplicationsPerUser = 5000 [= (int)(maxApplications * (userLimit /
>>> 100.0f) * userLimitFactor) ]
>>> maxActiveApplications = 1 [= max((int)ceil((clusterResourceMemory /
>>> minimumAllocation) * maxAMResourcePerQueuePercent * absoluteMaxCapacity),1)
>>> ]
>>> maxActiveAppsUsingAbsCap = 1 [= max((int)ceil((clusterResourceMemory /
>>> minimumAllocation) *maxAMResourcePercent * absoluteCapacity),1) ]
>>> maxActiveApplicationsPerUser = 1 [= max((int)(maxActiveApplications *
>>> (userLimit / 100.0f) * userLimitFactor),1) ]
>>> usedCapacity = 0.0 [= usedResourcesMemory / (clusterResourceMemory *
>>> absoluteCapacity)]
>>> absoluteUsedCapacity = 0.0 [= usedResourcesMemory /
>>> clusterResourceMemory]
>>> maxAMResourcePerQueuePercent = 0.1 [= configuredMaximumAMResourcePercent
>>> ]
>>> minimumAllocationFactor = 0.87506104 [= (float)(maximumAllocationMemory
>>> - minimumAllocationMemory) / maximumAllocationMemory ]
>>> numContainers = 0 [= currentNumContainers ]
>>> state = RUNNING [= configuredState ]
>>> acls = SUBMIT_APPLICATIONS: ADMINISTER_QUEUE:  [= configuredAcls ]
>>> nodeLocalityDelay = 0
>>>
>>>
>>> Then, look at values for maxActiveAppsUsingAbsCap &
>>> maxActiveApplicationsPerUser. That should help debugging.
>>>
>>> thanks,
>>> Arun
>>>
>>>
>>> On Sun, Sep 7, 2014 at 9:37 AM, Anfernee Xu <anfernee.xu@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'rm running my cluster at Hadoop 2.2.0,  and use CapacityScheduler.
>>>> And all my jobs are uberized and running among 2 queues, one queue takes
>>>> majority of capacity(90%), another take 10%. What I found is for small
>>>> queue, only one job is running for a given time, I tried twisting below
>>>> properties, but no luck so far, could you guys share some light on this?
>>>>
>>>>  <property>
>>>>     <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
>>>>     <value>1.0</value>
>>>>     <description>
>>>>       Maximum percent of resources in the cluster which can be used to
>>>> run
>>>>       application masters i.e. controls number of concurrent running
>>>>       applications.
>>>>     </description>
>>>>   </property>
>>>>
>>>>
>>>>  <property>
>>>>     <name>yarn.scheduler.capacity.root.queues</name>
>>>>     <value>default,small</value>
>>>>     <description>
>>>>       The queues at the this level (root is the root queue).
>>>>     </description>
>>>>   </property>
>>>>
>>>>  <property>
>>>>
>>>> <name>yarn.scheduler.capacity.root.small.maximum-am-resource-percent</name>
>>>>     <value>1.0</value>
>>>>   </property>
>>>>
>>>>
>>>>  <property>
>>>>     <name>yarn.scheduler.capacity.root.small.user-limit</name>
>>>>     <value>1</value>
>>>>   </property>
>>>>
>>>>  <property>
>>>>     <name>yarn.scheduler.capacity.root.default.capacity</name>
>>>>     <value>88</value>
>>>>     <description>Default queue target capacity.</description>
>>>>   </property>
>>>>
>>>>
>>>>   <property>
>>>>     <name>yarn.scheduler.capacity.root.small.capacity</name>
>>>>     <value>12</value>
>>>>     <description>Default queue target capacity.</description>
>>>>   </property>
>>>>
>>>>  <property>
>>>>     <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
>>>>     <value>88</value>
>>>>     <description>
>>>>       The maximum capacity of the default queue.
>>>>     </description>
>>>>   </property>
>>>>
>>>>   <property>
>>>>     <name>yarn.scheduler.capacity.root.small.maximum-capacity</name>
>>>>     <value>12</value>
>>>>     <description>Maximum queue capacity.</description>
>>>>   </property>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> --
>>>> --Anfernee
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>>
>> --
>> --Anfernee
>>
>
>
>
> --
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
--Anfernee

Mime
View raw message