hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode
Date Wed, 09 Aug 2017 16:50:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120227#comment-16120227
] 

Anoop Sam John commented on HDFS-10285:
---------------------------------------

HBase can get benefited from this feature. The scenario is as below
HBase allow the WAL files to be kept in low latency devices using the HSM feature.  (ALL_SSD/
ONE_SSD etc)  There is a directory for keeping all active WALs and we config the policy for
that. After certain time, the WAL file will become inactive as all the data in that is eventually
getting flushed into HFiles.  We will then archive it.  There is an archive directory and
the archive op is done via a rename to a file under the archive dir.  Obviously the archive
dir won't have any policy configured. By default we will keep the WAL files under archive
dir for some more min and then delete them. If the WAL can get deleted it is fine even if
the blocks of the WAL files continue to be in low latency device.  But there are some features
and scenarios under which the deletion of WAL from archive can get delayed. Few eg:s
Cross cluster replication in place and the peer replica is slow/down.  HBase do inter cluster
replication by reading the WAL. As long as the WAL cells are read and passed to other cluster,
we can not delete
Backup feature in use and the backup refers to WAL files (Snapshot feature also)
Incremental backup is enabled.  Unless an incremental backup is taken, WALs in that time range
can not be deleted.
Same for HFiles. After compaction, the compacted away files are archived and if they are referred
by some active snapshots, we may not be able to delete them immediately. 
So it makes all sense to make use of this feature for moving the File blocks out of low latency
devices so as to free space in it.
Once this feature is GA in a version and we can open up jira to make use of it.

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, HDFS-10285-consolidated-merge-patch-01.patch,
HDFS-SPS-TestReport-20170708.pdf, Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These policies
can be set on directory/file to specify the user preference, where to store the physical block.
When user set the storage policy before writing data, then the blocks could take advantage
of storage policy preferences and stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then the blocks
would have been written with default storage policy (nothing but DISK). User has to run the
‘Mover tool’ explicitly by specifying all such file names as a list. In some distributed
system scenarios (ex: HBase) it would be difficult to collect all the files and run the tool
as different nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage policy file
(inherited policy from parent directory) to another storage policy effected directory, it
will not copy inherited storage policy from source. So it will take effect from destination
file/dir parent storage policy. This rename operation is just a metadata change in Namenode.
The physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for admins from
distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the storage policy
satisfaction. A Daemon thread inside Namenode should track such calls and process to DN as
movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message