hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Templeton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3637) Handle localization sym-linking correctly at the YARN level
Date Tue, 17 Jan 2017 18:45:26 GMT

    [ https://issues.apache.org/jira/browse/YARN-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826581#comment-15826581
] 

Daniel Templeton commented on YARN-3637:
----------------------------------------

Thanks for the patch, [~ctrezzo].  A couple of comments to get this thing rolling again:

* When you create the new {{URI}}, it would seem desirable to let the {{URI()}} constructor
add the '#'.  What about {{new URI(pathURI.getScheme(), pathURI.getSchemeSpecificPart(), resourceName);}}?
* When you throw an exception for a malformed URL, it would be good to add a message to the
{{YarnException}} to give some context.
* It might be better to overload the {{use()}} method instead of replacing it.
* In the test code, I love that you added a failure messages, but it's better to keep the
tests as {{assertEquals()}} with a message that gives some context.


> Handle localization sym-linking correctly at the YARN level
> -----------------------------------------------------------
>
>                 Key: YARN-3637
>                 URL: https://issues.apache.org/jira/browse/YARN-3637
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Chris Trezzo
>            Assignee: Chris Trezzo
>         Attachments: YARN-3637-trunk.001.patch
>
>
> The shared cache needs to handle resource sym-linking at the YARN layer. Currently, we
let the application layer (i.e. mapreduce) handle this, but it is probably better for all
applications if it is handled transparently.
> Here is the scenario:
> Imagine two separate jars (with unique checksums) that have the same name job.jar.
> They are stored in the shared cache as two separate resources:
> checksum1/job.jar
> checksum2/job.jar
> A new application tries to use both of these resources, but internally refers to them
as different names:
> foo.jar maps to checksum1
> bar.jar maps to checksum2
> When the shared cache returns the path to the resources, both resources are named the
same (i.e. job.jar). Because of this, when the resources are localized one of them clobbers
the other. This is because both symlinks in the container_id directory are the same name (i.e.
job.jar) even though they point to two separate resource directories.
> Originally we tackled this in the MapReduce client by using the fragment portion of the
resource url. This, however, seems like something that should be solved at the YARN layer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message