hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13712) S3A open to avoid needless HEAD on the successful execution path
Date Tue, 12 Sep 2017 09:39:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162733#comment-16162733

Steve Loughran commented on HADOOP-13712:

We're not going to add any special APIs for opening files in S3a that end up needing maintenance
and an expectation that it won't get deleted. So -1 to that. But, with the {{createFile()}}
builder API, there's always the ability to provide hints when a file is opened.

Two hints to consider here are (a) length, and (b) what the initial read pos is going to be

we could also consider a "lazy-check" option which skips the existence check until any initial

with s3guard around, cost of getFilestatus is lower so I'm less worried the cost of that initial
HEAD, now I'm more worried about complexity of the codebase. 

But at the same time; interesting to consider what could be done to speedup unguarded stores

(also I've been thinking about whether alongside HADOOP-13282 we should collect/use the etag
of a file from first open (or at least, first seek()) to detect and react to file updates:
we could identify when a file changed & fail. S3guard doesn't currently track those etags

Anyway, that's not a conclusive answer except for a "-1 to any new public API". Have a look
at the new builder API for file opening, and see if you can see a way to do it there and we
can think about it

> S3A open to avoid needless HEAD on the successful execution path
> ----------------------------------------------------------------
>                 Key: HADOOP-13712
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13712
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
> S3A's open() operation does a {{getFileStatus()}} check to see if a file is not a directory
before opening with a GET. That initial check will take up at least one HEAD request if the
file is present, more if it isn't.
> As the GET itself performs the existence check, it is needless. A successful GET of a
path which doesn't end in "/" means a file was there. The only reason a getFileStatus call
is needed is to choose which error message to display if the path isn't there: is it an FNFE
or is it path-is-directory.
> Proposed: reorder the code to do the GET; only if that fails fallback to getFileStatus()

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message