hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15625) S3A input stream to use etags/version number to detect changed source files
Date Wed, 13 Mar 2019 15:42:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791820#comment-16791820
] 

Steve Loughran commented on HADOOP-15625:
-----------------------------------------

bq. Re: troubleshooting_s3a.md and the tab removal 

tabs aren't allowed in our source tree, but stack traces from java have them, hitting the
"convert tabs to spaces" button on the IDE is part of the workflow of adding them to our docs.
Nothing important

> S3A input stream to use etags/version number to detect changed source files
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-15625
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15625
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Ben Roling
>            Priority: Major
>         Attachments: HADOOP--15625-006.patch, HADOOP-15625-001.patch, HADOOP-15625-002.patch,
HADOOP-15625-003.patch, HADOOP-15625-004.patch, HADOOP-15625-005.patch, HADOOP-15625-006.patch,
HADOOP-15625-007.patch, HADOOP-15625-008.patch, HADOOP-15625-009.patch, HADOOP-15625-010.patch,
HADOOP-15625-011.patch, HADOOP-15625-012.patch, HADOOP-15625-013-delta.patch, HADOOP-15625-013.patch,
HADOOP-15625-014.patch, HADOOP-15625-015.patch, HADOOP-15625-015.patch, HADOOP-15625-016.patch
>
>
> S3A input stream doesn't handle changing source files any better than the other cloud
store connectors. Specifically: it doesn't noticed it has changed, caches the length from
startup, and whenever a seek triggers a new GET, you may get one of: old data, new data, and
even perhaps go from new data to old data due to eventual consistency.
> We can't do anything to stop this, but we could detect changes by
> # caching the etag of the first HEAD/GET (we don't get that HEAD on open with S3Guard,
BTW)
> # on future GET requests, verify the etag of the response
> # raise an IOE if the remote file changed during the read.
> It's a more dramatic failure, but it stops changes silently corrupting things.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message