hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10285) Storage Policy Satisfier in Namenode
Date Tue, 01 Aug 2017 00:36:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16108210#comment-16108210
] 

Uma Maheswara Rao G commented on HDFS-10285:
--------------------------------------------

Hi [~eddyxu], Thank you for the review.
Here are my replies.

{quote}
Non-recursively set xattr. Please kindly re-consider to use recursive async call. If the use
cases are mostly targeted to the downstream projects like HBase and etc., the chance of these
projects mistakenly call satisfyStoragePolicy on wrong directory (i.e., "/") is rare, but
it will make the projects to manage large / deep namespace difficult, i.e., hbase needs to
iterate the namespace itself and calls the same amount of "setXattr" anyway (because the #
of files to move is the same). Similar to "rm -rf /", while it is bad that "rm" allows to
do it, but IMO it should not prevent users / applications to use "rm -rf" in a sensible way.
{quote}
Thank you for providing feedback and exposing pinpoints from user stand point of view. As
this moment, seems like recursive is more helpful think to consider from the feedbacks, by
Andrew and you. We will work on this item. 

{quote}
The newly added public void removeXattr(long id, String xattrName). While its name seems very
generic, it seems only allow taking sps xattr as legit parameter. Should we demote it from
public API in Namesystem?
{quote}
This was intentional. Since Namesystem is generic interface between BM and FSNamesystem, API
name can be more generic incase if thats useful for other purposes. Means any Xattrs you can
pass to this API to remove it. It may not be good to add more specific APIs to it. 

{quote}
Would it make sense to have an admin command to unset SPS on a path? For an user to undo his
own mistake.
{quote}
Make sense to consider it. Would you mind to file a JIRA under HDFS-12226 ?

{quote}
FSNamesystem#satisfyStoragePolicy. Is this only setting xattr? Can we do the setting xattr
part without SPS running? I was thinking the scenarios that: some downstream projects (i.e.,
hbase) start to routinely use this API, while for some reason (i.e., mover is running or cluster
misconfiguration), SPS is not running, should we still allow these projects to successfully
call the satisfyStoragePolicy(), and allow SPS to catch up later on?
{quote}
Interesting point. Worth filing a JIRA for more discussion on this? There could be some risk:
 who will clean that Xattr incase, if admin is never enabling SPS.  May be we should bring,
self expiry or something like that. We have created followup JIRA, which is intend improve
the feature even after merging into trunk. If you feel things can be done even after merge,
please file under HDFS-12226

{quote}
And since this call essentially triggers a large async background task, should we put some
logs here? Similarly, it'd be nice to have related JMX stats and some indications in web UI,
to be easier to integrate with other systems.
{quote}
Good suggestions. I will add this comment under metrics JIRA. HDFS-12228 to track.

Thank you helping on reviews

> Storage Policy Satisfier in Namenode
> ------------------------------------
>
>                 Key: HDFS-10285
>                 URL: https://issues.apache.org/jira/browse/HDFS-10285
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: HDFS-10285
>            Reporter: Uma Maheswara Rao G
>            Assignee: Uma Maheswara Rao G
>         Attachments: HDFS-10285-consolidated-merge-patch-00.patch, HDFS-10285-consolidated-merge-patch-01.patch,
HDFS-SPS-TestReport-20170708.pdf, Storage-Policy-Satisfier-in-HDFS-June-20-2017.pdf, Storage-Policy-Satisfier-in-HDFS-May10.pdf
>
>
> Heterogeneous storage in HDFS introduced the concept of storage policy. These policies
can be set on directory/file to specify the user preference, where to store the physical block.
When user set the storage policy before writing data, then the blocks could take advantage
of storage policy preferences and stores physical block accordingly. 
> If user set the storage policy after writing and completing the file, then the blocks
would have been written with default storage policy (nothing but DISK). User has to run the
‘Mover tool’ explicitly by specifying all such file names as a list. In some distributed
system scenarios (ex: HBase) it would be difficult to collect all the files and run the tool
as different nodes can write files separately and file can have different paths.
> Another scenarios is, when user rename the files from one effected storage policy file
(inherited policy from parent directory) to another storage policy effected directory, it
will not copy inherited storage policy from source. So it will take effect from destination
file/dir parent storage policy. This rename operation is just a metadata change in Namenode.
The physical blocks still remain with source storage policy.
> So, Tracking all such business logic based file names could be difficult for admins from
distributed nodes(ex: region servers) and running the Mover tool. 
> Here the proposal is to provide an API from Namenode itself for trigger the storage policy
satisfaction. A Daemon thread inside Namenode should track such calls and process to DN as
movement commands. 
> Will post the detailed design thoughts document soon. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message