hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-2983) NPE possible in ClientRMService#getQueueInfo
Date Sun, 21 Dec 2014 11:58:13 GMT

     [ https://issues.apache.org/jira/browse/YARN-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Varun Saxena updated YARN-2983:
-------------------------------
    Description: 
While going through code for checking YARN-2978 , found one issue. 
During construction of {{GetQueueInfoResponse}} in {{ClientRMService#getQueueInfo}}, we first
collect application attempts from scheduler and then get apps from a {{ConcurrentHashMap}}
in {{RMContext}}. Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap
is thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a for loop) are
not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a loop. Now a ConcurrentHashMap#get
can return null if the key doesnt exist. But there is no null check inside this for loop before
dereferencing the value returned i.e. rmApp. Although all the applicationattempts  have been
fetched for the queue just above the for loop, but as this block of code is not synchronized,
there is a possibility that another thread may delete RMApp from the ConcurrentHashMap at
the same time. This can happen when an app finishes/completes and number of completed apps
exceed the config  {{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
      throws YarnException {
  .....
  if (request.getIncludeApplications()) {
    List<ApplicationAttemptId> apps =
    scheduler.getAppsInQueue(request.getQueueName());
    appReports = new ArrayList<ApplicationReport>(apps.size());
    for (ApplicationAttemptId app : apps) {
      RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
      appReports.add(rmApp.createAndGetApplicationReport(null, true));
    }
  }
  ......
}
{code}

  was:
While going through code for checking YARN-2978 , found one issue. 
During construction of {{GetQueueInfoResponse}} in {{ClientRMService#getQueueInfo}}, we first
collect application attempts from scheduler and then get apps from a {{ConcurrentHashMap}}
in {{RMContext}}. Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap
is thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a for loop) is
not. 

For instance, in code below, we are calling rmContext.getRMApps()#get in a loop. Now a ConcurrentHashMap#get
can return null if the key doesnt exist. But there is no null check inside this for loop before
dereferencing the value returned i.e. rmApp. Although all the applicationattempts  have been
fetched for the queue just above the for loop, but as this block of code is not synchronized,
there is a possibility that another thread may delete RMApp from the ConcurrentHashMap at
the same time. This can happen when an app finishes/completes and number of completed apps
exceed the config  {{yarn.resourcemanager.max-completed-applications}}.
I think there should be a null check inside this for loop, otherwise a NPE can occur.

{code:title=ClientRMService#getQueueInfo}
public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
      throws YarnException {
  .....
  if (request.getIncludeApplications()) {
    List<ApplicationAttemptId> apps =
    scheduler.getAppsInQueue(request.getQueueName());
    appReports = new ArrayList<ApplicationReport>(apps.size());
    for (ApplicationAttemptId app : apps) {
      RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
      appReports.add(rmApp.createAndGetApplicationReport(null, true));
    }
  }
  ......
}
{code}


> NPE possible in ClientRMService#getQueueInfo
> --------------------------------------------
>
>                 Key: YARN-2983
>                 URL: https://issues.apache.org/jira/browse/YARN-2983
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.1
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>
> While going through code for checking YARN-2978 , found one issue. 
> During construction of {{GetQueueInfoResponse}} in {{ClientRMService#getQueueInfo}},
we first collect application attempts from scheduler and then get apps from a {{ConcurrentHashMap}}
in {{RMContext}}. Although the operation(get/put/remove,etc) itself on a ConcurrentHashMap
is thread-safe, but a series of multiple {{ConcurrentHashMap#get}} (say, in a for loop) are
not. 
> For instance, in code below, we are calling rmContext.getRMApps()#get in a loop. Now
a ConcurrentHashMap#get can return null if the key doesnt exist. But there is no null check
inside this for loop before dereferencing the value returned i.e. rmApp. Although all the
applicationattempts  have been fetched for the queue just above the for loop, but as this
block of code is not synchronized, there is a possibility that another thread may delete RMApp
from the ConcurrentHashMap at the same time. This can happen when an app finishes/completes
and number of completed apps exceed the config  {{yarn.resourcemanager.max-completed-applications}}.
> I think there should be a null check inside this for loop, otherwise a NPE can occur.
> {code:title=ClientRMService#getQueueInfo}
> public GetQueueInfoResponse getQueueInfo(GetQueueInfoRequest request)
>       throws YarnException {
>   .....
>   if (request.getIncludeApplications()) {
>     List<ApplicationAttemptId> apps =
>     scheduler.getAppsInQueue(request.getQueueName());
>     appReports = new ArrayList<ApplicationReport>(apps.size());
>     for (ApplicationAttemptId app : apps) {
>       RMApp rmApp = rmContext.getRMApps().get(app.getApplicationId());
>       appReports.add(rmApp.createAndGetApplicationReport(null, true));
>     }
>   }
>   ......
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message