hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jitendra Nath Pandey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7240) Object store in HDFS
Date Fri, 05 Jun 2015 08:06:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574099#comment-14574099

Jitendra Nath Pandey commented on HDFS-7240:

The call started with high level description of object stores, motivations and the design
approach as covered in the architectural document.
Following points were discussed in detail
   # 3 level namespace with storage volumes, buckets and keys vs 2 level namespace with buckets
and keys
      #* Storage volumes are created by admins and provide admin controls such as quota. Buckets
are created and managed by user.
        Since HDFS doesn't have a separate notion of user accounts as in S3 or Azure, Storage
volume allows admins to set policies.
      #* The argument in favor of 2 level scheme was that typically organizations have very
few buckets and users organize their data within the buckets. The admin controls can be set
at bucket level.
   # Is it exactly S3 API? It would be good to easily migrate from s3 to Ozone.
      #* Storage volume concept is not in S3. In Azure, accounts are part of the URL, Ozone
URLs look similar to Azure with storage volume instead of account name.     
      #* We will publish a more detailed spec including headers, authorization semantics etc.
We try to follow S3 closely.
   # Http2
      #* There is a jira already in hadoop for http2. We should evaluate supporting http2
as well.
   # OzoneFileSystem: Hadoop file system implementation on top of ozone, similar to S3FileSystem.
      #* It will not support rename
      #* This was only briefly mentioned.
   # Storage Container Implementation
      #* Storage container replication must provide efficient replication. Replication by
key-object enumeration will be too inefficient. RocksDB is a promising choice as it provides
features for live replication i.e. replication while it is being written. In the architecture
document we talked about leveldbjni. RocksDB is similar, and provides additional features
and java binding as well.
      #* If a datanode dies and some of the containers lag in generation stamp, these containers
will be discarded. Since containers are much larger than typical hdfs blocks, this will be
lot more inefficient. An important optimization is needed to allow stale containers to catch
up the state.
      #* To support a large range of object sizes, a hybrid model may be needed: Store small
objects in RocksDB, but large objects as files with their file-paths in RocksDB.
      #* Colin suggested Linux sparse files.
      #* We are working on a prototype.
   # Ordered listing with read after write semantics might be an important requirement. In
the hash partitioning scheme that would need consistent secondary indexes or a range partitioning
should be used. This needs to be investigated.

I will follow up on these points and update the design doc.

It was a great discussion with many valuable points raised. Thanks to everyone who attended.

> Object store in HDFS
> --------------------
>                 Key: HDFS-7240
>                 URL: https://issues.apache.org/jira/browse/HDFS-7240
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jitendra Nath Pandey
>            Assignee: Jitendra Nath Pandey
>         Attachments: Ozone-architecture-v1.pdf
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a generic storage
layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the
storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode storage, but
independent of namespace metadata.
> I will soon update with a detailed design document.

This message was sent by Atlassian JIRA

View raw message