hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4844) Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource
Date Fri, 17 Jun 2016 05:12:05 GMT

    [ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335418#comment-15335418
] 

Wangda Tan commented on YARN-4844:
----------------------------------

[~kasha],

bq. getMemory is deprecated, but getVirtualCores is not
The reason why only update getMemory is it is the real problem. In the near future, virtualCores
is not likely go beyond max value of int. Considering size of the patch, I only updated getMemory.

bq. getMemory is deprecated and recommends using getMemorySize, but getMemorySize is unstable.
Seems like the users are stuck between rock and a hard place?
I was thinking this is the first release of the new API, probably we could update it. I'm
open update it to Evolving or even Stable API if you think it is required.

bq. Is the recommendation to use the long version for everything - individual resource-requests
and variables that are used to capture aggregates? If yes, shouldn't we update all current
usages to the long version?
I've tried updated most of them, except few APIs (like mapreduce.JobStatus), getMemory is
used by YARN/MR 1k+ times, I believe there're missed places. I can address them before release
of 2.8.

bq. Also, do you think we can get this in 2.9 instead so we can be sure other stuff doesn't
break?
I would prefer to leave it in 2.8, this is the real problem that we saw a couple of cases,
and basically client can do nothing except restart services. I've tried to build several YARN
downstream projects such as Spark/Slider/Tez against this patch, all of them can be built
with the api fixes: https://issues.apache.org/jira/secure/attachment/12810580/YARN-4844-branch-2.8.addendum.2.patch
Considering there're still 15+ pending blockers and critical issues for 2.8, there're at least
few weeks to finish 2.8, we can test more downstream projects if you want.

bq. Also, noticed that some of the helper methods in Resources seem to using getMemorySize
for calculations but typecasting to int as in this example:
I will double check them as well as issues you found at YARN-5077. I plan to create a new
JIRA to address these issues instead of overloading this one.



> Add getMemorySize/getVirtualCoresSize to o.a.h.y.api.records.Resource
> ---------------------------------------------------------------------
>
>                 Key: YARN-4844
>                 URL: https://issues.apache.org/jira/browse/YARN-4844
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>             Fix For: 2.8.0
>
>         Attachments: YARN-4844-branch-2.8.0016_.patch, YARN-4844-branch-2.8.addendum.2.patch,
YARN-4844-branch-2.addendum.1_.patch, YARN-4844-branch-2.addendum.2.patch, YARN-4844.1.patch,
YARN-4844.10.patch, YARN-4844.11.patch, YARN-4844.12.patch, YARN-4844.13.patch, YARN-4844.14.patch,
YARN-4844.15.patch, YARN-4844.16.branch-2.patch, YARN-4844.16.patch, YARN-4844.2.patch, YARN-4844.3.patch,
YARN-4844.4.patch, YARN-4844.5.patch, YARN-4844.6.patch, YARN-4844.7.patch, YARN-4844.8.branch-2.patch,
YARN-4844.8.patch, YARN-4844.9.branch, YARN-4844.9.branch-2.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G memory, we
will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending resources of running
apps to cluster's total pending resources. If a problematic app requires too much resources
(let's say 1M+ containers, each of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that there're
many running apps, each of them has capped but still significant numbers of pending resources.
> So we may possibly need to add getMemoryLong/getVirtualCoreLong to o.a.h.y.api.records.Resource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message