hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-7712) Add ability to ignore timestamps in localized files
Date Thu, 11 Jan 2018 05:31:04 GMT

    [ https://issues.apache.org/jira/browse/YARN-7712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16321724#comment-16321724

Miklos Szegedi commented on YARN-7712:

Thank you for the reply, [~chris.douglas]. The scenario is mainly for testing and demonstrating
the REST API behavior for future users.
Here is the current launch command list when launching an AM from the REST API:
1. The client has to upload a dependency to localize to HDFS
2. The client has to grab the timestamp from HDFS
3. The client runs a job through the rest API specifying the localized file with the timestamp
The client can run a job faster and with less effort with the suggested change:
1. The client has to upload a jar to HDFS
3. The client runs a job through the rest API specifying the localized file with ignored timestamp
In my opinion, the timestamp specification requirement has multiple issues.
1. It does not protect security. The client gets the failing timestamp in the error message
2. It is an annoyance in basic clusters and testing scenarios especially REST api users
3. The user can restrict the directory where it uploads to in order to protect consistency
4. The additional hop adds latency that is not necessary in cases 2. and 3.
5. If I had to think about a design to use timestamp to protect consistency, I would
  a) make sure time is trusted in the cluster and modification timestamp is trusted in HDFS
  b) grab a launch timestamp {{tl}} (or desired minimum timestamp), when the client starts
and place it in ContainerLaunchContext just like it is now
  c) verify that the file modification time is less than the launch or any other specified
timestamp at localization time {{tm < tl}}.
  This would ensure the same level of consistency without additional latency to REST users
through Python for example.
6. The PathHandle that you suggested is a better option, I admit.

> Add ability to ignore timestamps in localized files
> ---------------------------------------------------
>                 Key: YARN-7712
>                 URL: https://issues.apache.org/jira/browse/YARN-7712
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Miklos Szegedi
>            Assignee: Miklos Szegedi
> YARN currently requires and checks the timestamp of localized files and fails, if the
file on HDFS does not match to the one requested. This jira adds the ability to ignore the
timestamp based on the request of the client.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message