hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1771) many getFileStatus calls made from node manager for localizing a public distributed cache resource
Date Mon, 03 Mar 2014 22:02:22 GMT

    [ https://issues.apache.org/jira/browse/YARN-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13918625#comment-13918625
] 

Chris Douglas commented on YARN-1771:
-------------------------------------

bq. Orthogonal to this we have been discussing adding a FileStatus[] getFileStatus(Path f)
API that returns FileStatus for each path component of f in a single RPC.

Symlinks might be awkward to support, but that discussion is for a separate ticket. Do you
have a JIRA ref?

bq. So I think we need some kind of access check, either as the requesting user or explicit
access checks like it does today, to avoid a malicious client obtaining access to private
files via the NM.

An HDFS "nobody" account?

A cache would probably be correct in almost all cases, though. Since the check is only performed
when the resource is localized, there could be cases where the filesystem is never in the
cached state, but those are rare (and as Sandy points out, already in the current design).
To attack the cache, the writer would need to take an unprotected directory, change its permissions,
then populate it with private data (whose attributes are guessable). Expiring after short
internals and not populating the cache with failed localization attempts could help mitigate
its effectiveness.

> many getFileStatus calls made from node manager for localizing a public distributed cache
resource
> --------------------------------------------------------------------------------------------------
>
>                 Key: YARN-1771
>                 URL: https://issues.apache.org/jira/browse/YARN-1771
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>            Reporter: Sangjin Lee
>            Assignee: Sangjin Lee
>            Priority: Critical
>
> We're observing that the getFileStatus calls are putting a fair amount of load on the
name node as part of checking the public-ness for localizing a resource that belong in the
public cache.
> We see 7 getFileStatus calls made for each of these resource. We should look into reducing
the number of calls to the name node. One example:
> {noformat}
> 2014-02-27 18:07:27,351 INFO audit: ... cmd=getfileinfo	src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar
...
> 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo	src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar
...
> 2014-02-27 18:07:27,352 INFO audit: ... cmd=getfileinfo	src=/tmp/temp-887708724/tmp883330348
...
> 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo	src=/tmp/temp-887708724 ...
> 2014-02-27 18:07:27,353 INFO audit: ... cmd=getfileinfo	src=/tmp ...
> 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo	src=/	 ...
> 2014-02-27 18:07:27,354 INFO audit: ... cmd=getfileinfo	src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar
...
> 2014-02-27 18:07:27,355 INFO audit: ... cmd=open	src=/tmp/temp-887708724/tmp883330348/foo-0.0.44.jar
...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message