hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-13651) S3Guard: S3AFileSystem Integration with MetadataStore
Date Mon, 10 Oct 2016 22:48:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563782#comment-15563782
] 

Aaron Fabbri edited comment on HADOOP-13651 at 10/10/16 10:47 PM:
------------------------------------------------------------------

Minor status update, since this JIRA has a long gestation period. I'm working on this now.
 So far I have code for:

- New config values: {{fs.s3a.metadatastore.authoratitive}}, and {{fs.s3a.metadatastore.impl}}.
- getFileStatus()
- listStatus()
- rename()
- delete()
- mkdirs()
- copyFromLocalFile()
- copyFile()

What remains for this jira:
- create().  Figuring out the OutputStream plumbing now 
- More testing.

What I'd like to do as separate jiras (because I favor smaller code reviews).
- Delete tracking
- Retries (i.e. eventual consistency retry policy).  Would love to see this in isolation since
it is non-trivial.

I'm inserting TODO comments as I go at key locations for those two items.

Interesting things about my approach so far:

I'm trying to minimize changes to {{S3AFileSystem}}
   - diff stat so far: {quote}
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java        
          | 116 ++++++++++++++++++++++++++++++------
{quote}
   - I introduce a "metadatastore s3a helper/glue" class S3Guard which is a bunch of static
helper functions, so far.
   - I introduce {{NullMetadataStore}} which is a no-op metadata store.   Goal was to simplify
S3AFileSystem changes (always call MetadataStore, don't care if it is no-op), but I also like
that it further clarifies {{MetadataStore}} semantics.  Turns out S3AFileSystem still sometimes
wants to know if there is no MetadataStore to avoid allocating stuff that isn't needed.  Seems
like ok tradeoff but I'll let folks comment when I post v1 patch.

I'm trying to keep PathMetadata simple:  Either you have a PathMetadata, including S3AFileStatus,
or  you don't.   There are some spots where it would be convenient to just record "this path
exists, but we don't have metadata yet", (e.g. create() -> OutputStream.close() -> S3AFileSystem.writeFinished()..
at that point I don't have a FileStatus.), but that would complicate S3AFileSystem logic.
 We'll see.



was (Author: fabbri):
Minor status update, since this JIRA has a long gestation period. I'm working on this now.
 So far I have code for:

- New config values: {{fs.s3a.metadatastore.authoratitive}}, and {{fs.s3a.metadatastore.impl}}.
- getFileStatus()
- listStatus()
- rename()
- delete()
- mkdirs()
- copyFromLocalFile()
- copyFile()

What remains for this jira:
- create().  Figuring out the OutputStream plumbing now 
- More testing.

What I'd like to do as separate jiras (because I favor smaller code reviews).
- Delete tracking
- Retries (i.e. eventual consistency retry policy).  Would love to see this in isolation since
it is non-trivial.

I'm inserting TODO comments as I go at key locations for those two items.

Interesting things about my approach so far:

I'm trying to minimize changes to {{S3AFileSystem}}
   - diff stat so far: {quote}
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java        
          | 116 ++++++++++++++++++++++++++++++------
{quote}
   - I introduce a "metadatastore s3a helper/glue" glass S3Guard which is a bunch of static
helper functions, so far.
   - I introduce {{NullMetadataStore}} which is a no-op metadata store.   Goal was to simplify
S3AFileSystem changes (always call MetadataStore, don't care if it is no-op), but I also like
that it further clarifies {{MetadataStore}} semantics.  Turns out S3AFileSystem still sometimes
wants to know if there is no MetadataStore to avoid allocating stuff that isn't needed.  Seems
like ok tradeoff but I'll let folks comment when I post v1 patch.

I'm trying to keep PathMetadata simple:  Either you have a PathMetadata, including S3AFileStatus,
or  you don't.   There are some spots where it would be convenient to just record "this path
exists, but we don't have metadata yet", (e.g. create() -> OutputStream.close() -> S3AFileSystem.writeFinished()..
at that point I don't have a FileStatus.), but that would complicate S3AFileSystem logic.
 We'll see.


> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>
>                 Key: HADOOP-13651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13651
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata consistency
and caching.
> Implementation should have minimal overhead when no MetadataStore is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message