hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-660) Improve AMRMClient with cookies and matching requests
Date Sat, 11 May 2013 22:35:16 GMT

    [ https://issues.apache.org/jira/browse/YARN-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655384#comment-13655384
] 

Bikas Saha commented on YARN-660:
---------------------------------

bq. Coming back to current issues at hand. Correct me if I am wrong, but it looks like these
StoredContainerRequests are only useful if I have a Task in my job which needs a container
and the returned container will only be used by that task?
I am afraid that is wrong. Every scheduler has a bunch of tasks that it needs to allocate.
It makes requests for containers. When containers are returned then it chooses which tasks
to assign to those containers. That is the generic use case for schedulers. StoredContaineRequest
is unrelated to it. 
AMRMClientImpl/AMRMClientAsync are already built for the use case specified. A bunch of requests
can be made and when containers are allocated then the user can get them and do its second
pass of scheduling.
bq.The current test-case is clearly not doing enough to explain how it is usable - we can
implement a simple job which has tasks and needs containers inside this test-case and then
illustrate the library usage.
Please read the code in TestAMRMClient.testAMRMClient() and the code in the DistributedShell
example app. Both create an RM, register an application, request containers, make allocations
and deallocations and then unregister the application. I believe the code clearly shows how
all the api's can be used for the full lifecycle of an application. The new tests only test
for newly added storage feature and bug fixes. I could add more test code to go through the
same lifecycle operations when using StoreContainerRequest.

It might be useful if you can go through all the comments in YARN-103, YARN-277 and YARN-417.
It may help in understanding why the code/apis are the way they are. 7 contributors were involved
in those jiras in writing and reviewing the code and I am confident in the technical and stylistic
quality of the code as it stands. That being said, there is always room for improvement if
you find issues.

Let me try to explain the contribution made by this jira so that we all can be on the same
page.

When schedulers receive container allocations then they need to figure out which of their
requests matched those containers so that they can remove those requests from the pending
requests. Containers can match requests at node/rack/* level and hence the library itself
cannot determine which pending request to remove. It needs to be told by the scheduler which
request to remove. All of this already works if the scheduler can find matching requests for
a container on its own or when requests were made at only at * priority (for locality other
than * the RM needs client help to do book-keeping so that it does not end up allocating containers
at unwanted locality).

This jira takes up the problem of helping schedulers find matching requests for allocated
containers. Since the library is the entity that translates request to the RM, it would be
best to let the library provide matching containers. If the scheduler maintains its own separate
mapping information then its logic may diverge from the library's. So the main contribution
of the jira is to store requests made via addContainerRequest() and retrieve matching requests
via getMatchingRequests(). Stored containers must be removed or else the library will end
up holding onto a lot of memory. If ContainerRequest is not constrained to a container count
of 1 then doing this add and removal book-keeping becomes difficult to implement in a water-tight
manner. That is why StoredContainerRequest was created which only constrains the container
count to 1. Now the API becomes elegant and intuitive. addContainerRequest() to add requests.
getMatchingRequests() to get all matching requests. Pick a container from matching requests
and call removeContainerRequest() with it to remove it. That is all that there is to it. There
are some other minor fixes to AMRMClientAsync. We are on the same page wrt this being a needed
functionality.
The cookie added to StoredContainerRequest is very useful IMO but it strictly does not have
to be there. Users can extend StoreContainerRequest to do the same thing and most users will
IMO and so I added it there.
StoredContainerRequest or any extensions of ContainerRequest can be passed into the library.
Doing it without using the help of generics allows for the possibility of writing non-type
checked code as well as mixing different types of ContainerRequests. Besides, it will require
users to make ugly unsafe casts in their code for which the compiler will generate warnings.
While it may be fine for internal YARN code to have 100's of suppressed unchecked cast warnings,
we should not force external user code to have to live with that. They may have their own
coding guidelines. This makes the use of generics essential for this user-facing library.
I can attach the patch with the non-generic version of the library and compare the number
of compiler warnings in both patches.

Hopefully this clarifies and explains the changes in this patch. Please feel free to make
comments on bugs and technical/logical incorrectness issues in the patch.
                
> Improve AMRMClient with cookies and matching requests
> -----------------------------------------------------
>
>                 Key: YARN-660
>                 URL: https://issues.apache.org/jira/browse/YARN-660
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>    Affects Versions: 2.0.5-beta
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>             Fix For: 2.0.5-beta
>
>         Attachments: YARN-660.1.patch, YARN-660.2.patch, YARN-660.3.patch, YARN-660.4.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message