hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-10285) Storage Policy Satisfier in Namenode
Date Thu, 14 Apr 2016 02:22:25 GMT
Uma Maheswara Rao G created HDFS-10285:
------------------------------------------

             Summary: Storage Policy Satisfier in Namenode
                 Key: HDFS-10285
                 URL: https://issues.apache.org/jira/browse/HDFS-10285
             Project: Hadoop HDFS
          Issue Type: New Feature
          Components: datanode, namenode
    Affects Versions: 2.7.2
            Reporter: Uma Maheswara Rao G
            Assignee: Uma Maheswara Rao G


Heterogeneous storage in HDFS introduced the concept of storage policy. These policies can
be set on directory/file to specify the user preference, where to store the physical block.
When user set the storage policy before writing data, then the blocks could take advantage
of storage policy preferences and stores physical block accordingly. 

If user set the storage policy after writing and completing the file, then the blocks would
have been written with default storage policy (nothing but DISK). User has to run the ‘Mover
tool’ explicitly by specifying all such file names as a list. In some distributed system
scenarios (ex: HBase) it would be difficult to collect all the files and run the tool as different
nodes can write files separately and file can have different paths.

Another scenarios is, when user rename the files from one effected storage policy file (inherited
policy from parent directory) to another storage policy effected directory, it will not copy
inherited storage policy from source. So it will take effect from destination file/dir parent
storage policy. This rename operation is just a metadata change in Namenode. The physical
blocks still remain with source storage policy.

So, Tracking all such business logic based file names could be difficult for admins from distributed
nodes(ex: region servers) and running the Mover tool. 

Here the proposal is to provide an API from Namenode itself for trigger the storage policy
satisfaction. A Daemon thread inside Namenode should track such calls and process to DN as
movement commands. 

Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message