hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4844) Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
Date Mon, 25 Apr 2016 22:18:13 GMT

    [ https://issues.apache.org/jira/browse/YARN-4844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257158#comment-15257158
] 

Wangda Tan commented on YARN-4844:
----------------------------------

bq. ...Given the debate about the extent of the changes we want to make, can we put a patch
that changes the int32 to int64, adds getMemoryLong with a Private annotation(so that we can
make changes later if we wish) and only fixes the pending memory check that was added in 2.8?...
I agree size of the patch looks scary :-p, however, if you look into the patch, they're all
very simple fixes, I don't think it will cause a lot of issues. You may feel better once I
fixed all Jenkins issues.
I have considered fix the pending resource calculation only, it looks hard to me. Because
calculation of pending resource uses ResourceCalculator/ResourceUsage. And ResourceCalculator
and related static methods of Resources used everywhere in RM.
It's a good idea to me to mark get___Long to @Private, currently pending resource hasn't been
exposed to application via Java API yet. Now it is only exposed in REST API which is fixed
by the patch already.

Thoughts?

> Upgrade fields of o.a.h.y.api.records.Resource from int32 to int64
> ------------------------------------------------------------------
>
>                 Key: YARN-4844
>                 URL: https://issues.apache.org/jira/browse/YARN-4844
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: api
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>         Attachments: YARN-4844.1.patch, YARN-4844.2.patch
>
>
> We use int32 for memory now, if a cluster has 10k nodes, each node has 210G memory, we
will get a negative total cluster memory.
> And another case that easier overflows int32 is: we added all pending resources of running
apps to cluster's total pending resources. If a problematic app requires too much resources
(let's say 1M+ containers, each of them has 3G containers), int32 will be not enough.
> Even if we can cap each app's pending request, we cannot handle the case that there're
many running apps, each of them has capped but still significant numbers of pending resources.
> So we may possibly need to upgrade int32 memory field (could include v-cores as well)
to int64 to avoid integer overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message