hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13282) S3 blob etags to be made visible in status/getFileChecksum() calls
Date Tue, 21 Nov 2017 17:58:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steve Loughran updated HADOOP-13282:
------------------------------------
    Attachment: HADOOP-13282-002.patch

HADOOP-13282: etag support for s3a.
* Move the EtagChecksum class into a new fs.store package in hadoop common for use by other
stores
* add tests there on its core equality/round trip operations
* Add a set of ITests for the S3A use. One of these tests is skipped if the FS is known to
be encrypted, in case the bucket returns different etags here. To aid: added a getter for
the S3AFS encryption algorithm.

With these tags, you can assume that if an object's etag changes, it is different. You cannot
safely use it to conclude that other objects, especially across stores, are equivalent.

(note this patch reorders all the headers in ITestS3AMiscOperations. They'd got out of order,
and as it's a low-patch, low-conflict file, I've taken the chance to fix it)

Tested

S3 London with encryption turned on; s3 ireland without

> S3 blob etags to be made visible in status/getFileChecksum() calls
> ------------------------------------------------------------------
>
>                 Key: HADOOP-13282
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13282
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13282-001.patch, HADOOP-13282-002.patch
>
>
> If the etags of blobs were exported via {{getFileChecksum()}}, it'd be possible to probe
for a blob being in sync with a local file. Distcp could use this to decide whether to skip
a file or not.
> Now, there's a problem there: distcp needs source and dest filesystems to implement the
same algorithm. It'd only work out the box if you were copying between S3 instances. There
are also quirks with encryption and multipart: [s3 docs|http://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html].
At the very least, it's something which could be used when indexing the FS, to check for changes
later.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message