hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15038) Abstract MetadataStore in S3Guard into a common module.
Date Mon, 04 Dec 2017 14:13:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276822#comment-16276822

Steve Loughran commented on HADOOP-15038:

This is something we should bring up on the common-dev list. 

# hadoop-cloud-core sounds nice
# I've been doing some work on cloudup locally (including a 2.7.x build); I'll need to submit
a new patch
# I've also been doing some other small patches factoring out common code from stores (HADOOP-14943),
where again, this stuff can be shared

Essentially: we've been copying and pasting stuff between versions, and it's reached the limits
of maintenance. See [https://hortonworks.com/blog/history-apache-hadoops-support-amazon-s3/]
for the illustration of that pasting.

Things I'd like to see in there

* the core "mimic a filesystem" functions
* standard statistic collection & names
* retry logic of S3A.Invoker
* Support for marshalling login secrets as filesystem delegation tokens (see HADOOP-14556).
This is needed for users to submit their own credentials & encryption keys to shared query
* Any CLI utility for listing/viewing things (see "hadoop s3guard bucket-info"), 
* Any CLI utility we can do for diagnostics. Your support team will love you for this.
* Any more integration tests we can do beyond the basic abstract contract & distcp tests.
We now have a variant of the [Hadoop MR protocol test suite|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitProtocol.java]
 and [MR Job|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/commit/AbstractITCommitMRJob.java]
One thing to consider though, the Ozone store under HDFS could be looking at some of this
stuff too, ultimately. Which means that hadoop-common.jar is the right place to put this stuff,
at least until there's a compelling reason to split it out. (Except: do that, things start
to depend on it, and splitting it becomes impossible....


BTW, if you haven't noticed, I've got a module designed to do some integration with and testing
under Apache Spark: https://github.com/hortonworks-spark/cloud-integration . At some point
I hope to submit to Apache Bahir, for now it's a bit too unstable.

> Abstract MetadataStore in S3Guard into a common module.
> -------------------------------------------------------
>                 Key: HADOOP-15038
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15038
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 3.0.0-beta1
>            Reporter: Genmao Yu
> Open this JIRA to discuss if we should move {{MetadataStore}} in {{S3Guard}} into a common
> Based on this work, other filesystem or object store can implement their own metastore
for optimization (known issues like consistency problem and metadata operation performance).
[~stevel@apache.org] and other guys have done many base and great works in {{S3Guard}}. It
is very helpful to start work. I did some perf test in HADOOP-14098, and started related work
for Aliyun OSS.  Indeed there are still works to do for {{S3Guard}}, like metadata cache inconsistent
with S3 and so on. It also will be a problem for other object store. However, we can do these
works in parallel.
> [~stevel@apache.org] [~fabbri] [~drankye] Any suggestion is appreciated.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message