hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bikas Saha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-744) Race condition in ApplicationMasterService.allocate .. It might process same allocate request twice resulting in additional containers getting allocated.
Date Tue, 19 Nov 2013 00:21:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825989#comment-13825989
] 

Bikas Saha commented on YARN-744:
---------------------------------

Better name?
{code}
+    AllocateResponseLock res = responseMap.get(applicationAttemptId);
{code}

reuse throwApplicationAttemptDoesNotExistInCacheException() in registerApplicationMaster()?

use InvalidApplicationMasterRequestException or a new specific exception instead of generic
RPCUtil.throwRemoteException()?
{code}
+  private void throwApplicationAttemptDoesNotExistInCacheException(
+      ApplicationAttemptId appAttemptId) throws YarnException {
+    String message = "Application doesn't exist in cache "
+        + appAttemptId;
+    LOG.error(message);
+    throw RPCUtil.getRemoteException(message);
+  }
{code}

The new logic is not the same as the old one. If the app is no longer in the cache then it
would send a resync response. Now it will send a regular response instead of a resync response.
{code}
-      // before returning response, verify in sync
-      AllocateResponse oldResponse =
-          responseMap.put(appAttemptId, allocateResponse);
-      if (oldResponse == null) {
-        // appAttempt got unregistered, remove it back out
-        responseMap.remove(appAttemptId);
-        String message = "App Attempt removed from the cache during allocate"
-            + appAttemptId;
-        LOG.error(message);
-        return resync;
-      }
-
+      res.setAllocateResponse(allocateResponse);
       return allocateResponse;
{code}

> Race condition in ApplicationMasterService.allocate .. It might process same allocate
request twice resulting in additional containers getting allocated.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-744
>                 URL: https://issues.apache.org/jira/browse/YARN-744
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Bikas Saha
>            Assignee: Omkar Vinit Joshi
>            Priority: Minor
>         Attachments: MAPREDUCE-3899-branch-0.23.patch, YARN-744-20130711.1.patch, YARN-744-20130715.1.patch,
YARN-744-20130726.1.patch, YARN-744.1.patch, YARN-744.patch
>
>
> Looks like the lock taken in this is broken. It takes a lock on lastResponse object and
then puts a new lastResponse object into the map. At this point a new thread entering this
function will get a new lastResponse object and will be able to take its lock and enter the
critical section. Presumably we want to limit one response per app attempt. So the lock could
be taken on the ApplicationAttemptId key of the response map object.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message