hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15625) S3A input stream to use etags to detect changed source files
Date Tue, 05 Mar 2019 23:36:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785037#comment-16785037

Steve Loughran commented on HADOOP-15625:

I patch 012, I Think we're pretty much done. Here are my comments, which I've done in my IDE
and attached as a diff to be applied to the 012 patch, and as a -013 patch for yetus to play

I like this, Set up a bucket with versioning on, reran the tests. I think I'll need to do
that for a while to see that I'm happy with it (+ with etag checking on by default, that happens
on every

* nice to see the change in S3ARetryPolicy. It would have failed anyway, but it is good to
see the explicit decision.

h2. Constants

rename "fs.s3a.change.detection.versionrequired" to "fs.s3a.change.detection.version.required"

h3. S3AInstrumentation

L646. Make versionMismatches private (checkstyle, inevitably my fault)

h3. S3AFileSystem

mark getChangeDetectionPolicy as {{@VisibleForTesting}}

h3. index.md

- some typos, `` markup and text changes.

h3. troubleshooting_s3a.md

-add `` round code
-a typo
-add `` round code
-cut down wide lines
in the stack traces and convert tabs to spaces there too.

h3. ITestS3ARemoteFileChanged

unset the bucket options in case someone has set a per-bucket override. Added a method to
S3ATestUtils here.

> S3A input stream to use etags to detect changed source files
> ------------------------------------------------------------
>                 Key: HADOOP-15625
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15625
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>            Priority: Major
>         Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, HADOOP-15625-002.patch,
HADOOP-15625-003.patch, HADOOP-15625-004.patch, HADOOP-15625-005.patch, HADOOP-15625-006.patch,
HADOOP-15625-007.patch, HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch,
HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch
> S3A input stream doesn't handle changing source files any better than the other cloud
store connectors. Specifically: it doesn't noticed it has changed, caches the length from
startup, and whenever a seek triggers a new GET, you may get one of: old data, new data, and
even perhaps go from new data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with S3Guard,
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message