hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-6403) Invalid local resource request can raise NPE and make NM exit
Date Wed, 29 Mar 2017 14:43:41 GMT

    [ https://issues.apache.org/jira/browse/YARN-6403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15947251#comment-15947251
] 

Jason Lowe commented on YARN-6403:
----------------------------------

Thanks for the patch!

This patch is changing the client code but not the server code.  A client who doesn't have
the fix or a malicious client can still construct a malformed protobuf that is missing the
resource location.  Minimally the server needs to validate the request.  The client-side change
is nice to have but technically not necessary to fix the issue.

Nit: Speaking of the client side change, I think NullPointerException is more appropriate
to throw in this case.  That's what the generated protobuf code already throws when trying
to set protobuf fields to null.


> Invalid local resource request can raise NPE and make NM exit
> -------------------------------------------------------------
>
>                 Key: YARN-6403
>                 URL: https://issues.apache.org/jira/browse/YARN-6403
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.8.0
>            Reporter: Tao Yang
>         Attachments: YARN-6403.001.patch
>
>
> Recently we found this problem on our testing environment. The app that caused this problem
added a invalid local resource request(have no location) into ContainerLaunchContext like
this:
> {code}
>     localResources.put("test", LocalResource.newInstance(location,
>         LocalResourceType.FILE, LocalResourceVisibility.PRIVATE, 100,
>         System.currentTimeMillis()));
>     ContainerLaunchContext amContainer =
>         ContainerLaunchContext.newInstance(localResources, environment,
>           vargsFinal, null, securityTokens, acls);
> {code}
> The actual value of location was null although app doesn't expect that. This mistake
cause several NMs exited with the NPE below and can't restart until the nm recovery dirs were
deleted. 
> {code}
> FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourceRequest.<init>(LocalResourceRequest.java:46)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:711)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl$RequestResourcesTransition.transition(ContainerImpl.java:660)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>         at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>         at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:1320)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:88)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1293)
>         at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:1286)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
>         at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:110)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> NPE occured when created LocalResourceRequest instance for invalid resource request.
> {code}
>   public LocalResourceRequest(LocalResource resource)
>       throws URISyntaxException {
>     this(resource.getResource().toPath(),  //NPE occurred here
>         resource.getTimestamp(),
>         resource.getType(),
>         resource.getVisibility(),
>         resource.getPattern());
>   }
> {code}
> We can't guarantee the validity of local resource request now, but we could avoid damaging
the cluster. Perhaps we can verify the resource both in ContainerLaunchContext and LocalResourceRequest?
Please feel free to give your suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message