hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14468) S3Guard: make short-circuit getFileStatus() configurable
Date Fri, 02 Jun 2017 18:40:04 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035211#comment-16035211

Aaron Fabbri commented on HADOOP-14468:

I created this JIRA to follow up on [your comment|https://issues.apache.org/jira/browse/HADOOP-13345?focusedCommentId=16019741&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16019741]
 and the discussion about failing fast when file is not visible in S3 in the read path.

I'm not 100% convinced we want this but it could be useful for:

1. Failing fast on open() instead of when we later read the stream.
2. A "safe mode" or fallback that can be enabled.  When this is set to false, we could collect
stats on any time MetadataStore differs from S3 which would be interesting.  I.e. "s3 / metastore
length differs" or "visible in metastore but not s3"

In general we do not support a mixed mode where some clients use S3Guard and others do not:
It is not safe.  However, if there is a well-known path where only an external process (e.g.
ETL) is dropping files for ingest, it may be nice to be able to support that more narrow case.
 I think the existing behavior with list checking S3 + MetadataStore is sufficient without
this change though.

> S3Guard: make short-circuit getFileStatus() configurable
> --------------------------------------------------------
>                 Key: HADOOP-14468
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14468
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
> Currently, when S3Guard is enabled, getFileStatus() will skip S3 if it gets a result
from the MetadataStore (e.g. dynamodb) first.
> I would like to add a new parameter {{fs.s3a.metadatastore.getfilestatus.authoritative}}
which, when true, keeps the current behavior.  When false, S3AFileSystem will check both S3
and the MetadataStore.
> I'm not sure yet if we want to have this behavior the same for all callers of getFileStatus(),
or if we only want to check both S3 and MetadataStore for some internal callers such as open().

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message