hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8888) Support volumes in HDFS
Date Wed, 19 Aug 2015 03:42:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702393#comment-14702393

Sanjay Radia commented on HDFS-8888:

There are several motivations for introducing Volumes to HDFS.

Simplify management and implementation
* Volumes make  the management of some HDFS features simpler: Quotas, Encryption, Snapshots
can become volume properties rather than properties of individual directories. As a unit of
management, Volumes also offers strong isolations in the security settings.
* It can simplify  the implementation of some them. For example if we don’t allow renaming
across a volume boundary then Snapshots’ implementation become easier.  Will customers accept
this restriction? Won’t some apps like Hive have to change since they rename from temp to
final destination? Recall we disallow renames across encryption zones and customers have found
that acceptable. Further, we changed Hive to  deal with this restriction.
* Volumes can also simplify the management of datasets. For example one can associate different
other policies for volumes. For example one can setup backup policies across DR zones based
on volumes. 

Isn’t it  more flexible to have features like encryption, snapshots on arbitrary directories?
Having a car with independent steering for each wheel is more flexible, but steering 2 wheels
together makes a car easier to control. Volumes, while restricting the granularity, will simplify
management and also the implementation.

*Relation to Federation*
How are volumes related to Federation? Currently in federation, each NN has a single volume.
This Jira will allow each NN to have multiple volumes. Volumes adds to the Federation model.
One can distribute/load balance volumes across NNs. Further it allows N+K failover especially
when we add partial namespace caching (HDFS-XXXX). (More on this later.)

Other things to explore with Volumes (outside the scope of this Jira)
* Each volume could become its own RW lock with in the NN. This would improve parallelism
within NN without much additional effort.
* Each volume could have its own image/journal to allow relocation of a volume to another
NN (see federation).
* Associate storage policies with  a volume such as the volume is  backed by the same storage.
The semantic allows new features like co-located data.

> Support volumes in HDFS
> -----------------------
>                 Key: HDFS-8888
>                 URL: https://issues.apache.org/jira/browse/HDFS-8888
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haohui Mai
> There are multiple types of zones (e.g., snapshottable directories, encryption zones,
directories with quotas) which are conceptually close to namespace volumes in traditional
file systems.
> This jira proposes to introduce the concept of volume to simplify the implementation
of snapshots and encryption zones.

This message was sent by Atlassian JIRA

View raw message