hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14098) AliyunOSS: improve the performance of object metadata operation
Date Thu, 19 Oct 2017 17:55:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16211442#comment-16211442
] 

Aaron Fabbri commented on HADOOP-14098:
---------------------------------------

I made efforts to keep the MetadataStore part of S3Guard a separate layer that other filesystems
could use.

In S3Guard, we use a MetadataStore as a trailing log of metadata edits used to guard against
list/stat inconsistency. The MetadataStore interface is also designed to be used as a limited-lifetime
cache of FileStatus objects which is demand-loaded and does not need to contain metadata for
all files in the underlying FS.  (e.g. implementations may only cache recently-seen entries).

Some pointers to get started:

1. The [MetadataStore|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStore.java]
interface.  Note there are zero imports of S3A code there.  It does live in hadoop-tools/hadoop-aws,
but I expected us to move it to a common place as soon as another FileSystem uses it.  (Note,
there used to be S3A specific code for empty directory behavior, but those have been fixed
and removed from MetadataStore layer).

2. A local (in-memory) implementation of the interface is [LocalMetadataStore|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java].
 This is not for production use at this time: The goal is to have an easy to run reference
implementation for tests.  It is not a perfect implementation but it is small (<500 LOC)
and supports authoritative directory listing bit (which the Dynamo implementation does not
yet support).  You could use this as a test implementation for integrating with a FileSystem.

3. There is a nice set of contract tests that validate that an implementation (i.e. different
back end) of MetadataStore works correctly.  The base test class is [MetadataStoreTestBase|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/MetadataStoreTestBase.java].
 If you wanted to develop a new back end you could essentially use this for test-driven development.

I agree with Steve L that this is a big amount of work.  A major caveat of the approach is
the lack of transactions around updating the two sources of truth (the FS and the MetadataStore).
 This means things can get out of sync when failures happen. The cost of transactions is prohibitive
with the backend we use (Dynamo), so our strategy in dealing with this is (1) use soft state
(entries in MetadataStore are expired via a prune CLI command that is scheduled).  (2) Have
good CLI tools for detecting and fixing any inconsistencies.  Another feature of the design
is that it is always safe to delete all the data in the MetadataStore.. Think of it as a cache
flush that can be used to clear inconsistencies in the worst case.  (Also: Deleting *some*
of the data may or may not be safe depending on the implementation of the MetadataStore).



> AliyunOSS: improve the performance of object metadata operation
> ---------------------------------------------------------------
>
>                 Key: HADOOP-14098
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14098
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Genmao Yu
>            Assignee: Genmao Yu
>
> Open this JIRA to research and address the potential request performance issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message