hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
Date Thu, 02 Nov 2017 11:13:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235561#comment-16235561
] 

Steve Loughran commented on HADOOP-13282:
-----------------------------------------

+as it saves 1 GET  +path "/" + one List, it saves ~ $0.009 to discover a file doesn't exist.
We could switch to it across the internal bits of our code which only look for the existence
of a file; the presence of a directory is considered as much a failure as no file.

> S3 blob etags to be made visible in status/getFileChecksum() calls
> ------------------------------------------------------------------
>
>                 Key: HADOOP-13282
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13282
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13282-001.patch
>
>
> If the etags of blobs were exported via {{getFileChecksum()}}, it'd be possible to probe
for a blob being in sync with a local file. Distcp could use this to decide whether to skip
a file or not.
> Now, there's a problem there: distcp needs source and dest filesystems to implement the
same algorithm. It'd only work out the box if you were copying between S3 instances. There
are also quirks with encryption and multipart: [s3 docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html].
At the very least, it's something which could be used when indexing the FS, to check for changes
later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message