hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-16355) ZookeeperMetadataStore: Use Zookeeper as S3Guard backend store
Date Fri, 07 Jun 2019 18:30:01 GMT
Mingliang Liu created HADOOP-16355:
--------------------------------------

             Summary: ZookeeperMetadataStore: Use Zookeeper as S3Guard backend store
                 Key: HADOOP-16355
                 URL: https://issues.apache.org/jira/browse/HADOOP-16355
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs
            Reporter: Mingliang Liu


When S3Guard was proposed, there are a couple of valid reasons to choose DynamoDB as its
default backend store: 0) seamless integration as part of AWS ecosystem e.g. client library
1) it's a managed web service which is zero operational cost, highly available and infinitely
scalable 2) it's performant with single digit latency 3) it's proven by Netflix's S3mper
(not actively maintained) and EMRFS (closed source and usage). As it's pluggable, it's possible
to implement {{MetadataStore}} with other backend store without changing semantics, besides
null and in-memory local ones.

Here we propose {{ZookeeperMetadataStore}} which uses Zookeeper as S3Guard backend store. Its
main motivation is to provide a new MetadataStore option which:
 # can be easily integrated as Zookeeper is heavily used in Hadoop community
 # affordable performance as both client and Zookeeper ensemble are usually "local" in a
Hadoop cluster (ZK/HBase/Hive etc)
 # removes DynamoDB dependency

Obviously all use cases will not prefer this to default DynamoDB store. For e.g. ZK might
not scale well if there are dozens of S3 buckets and each has millions of objects.

Our use case is targeting HBase to store HFiles on S3 instead of HDFS. A total solution for
HBase on S3 must be HBOSS (see HBASE-22149) for recovering atomicity of metadata operations
like rename, and S3Guard for consistent enumeration and access to object store bucket metadata.
We would like to use Zookeeper as backend store for both.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org


Mime
View raw message