hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7081) Add new DistributedFileSystem API for getting all the existing storage policies
Date Wed, 24 Sep 2014 01:20:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145738#comment-14145738
] 

Andrew Wang commented on HDFS-7081:
-----------------------------------

bq. If we can set storage policy directly on a directory, why do we still need to do it recursively?
But to provide a tool for easier administration (not just for setting storage policy) is always
good.

This is related to my question about renames. I could see an admin wanting to know that everything
in a subtree uses some storage policy. However, if a file already has a policy set and is
renamed underneath this subtree, the subtree's policy won't apply. A recursive tool could
be used to satisfy this usecase.

As one data point, I know Hive uses a temp dir during query processing and renames things
in and out.

I'm still hoping we can avoid this rename ambiguity though, since it'd make management simpler.
If we need per-file granularity, then I think my idea from above would work. Basically, do
not set UNSPECIFIED on files. At create time, a files sets its storage policy either to an
inherited parent policy, or the default policy. Then rename will never change a file's policy.

bq. For this one I have a question. According to the current document "TRUSTED namespace attributes
are only visible and accessible to privileged users." Currently the storage policy is actually
set by superuser and in HDFS we do not have root user. So does that mean we should use trusted
here?

TRUSTED and USER are meant to be used by end user applications. The idea is that apps can
stash whatever app data they want in those xattr namespaces and not worry about name collisions
(except from other apps). For HDFS developers who want to leverage xattr storage for a feature,
an internal namespace like system is more appropriate so as not to pollute the user namespaces.
As we're doing in this JIRA, the additional data can be exposed to users via some new API,
rather than through getXAttrs.

As to the rest, I'll just trust you and Nic. I'm not sure I'll have time to review more this
week, so we can just do follow-ons. Thanks guys.

> Add new DistributedFileSystem API for getting all the existing storage policies
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-7081
>                 URL: https://issues.apache.org/jira/browse/HDFS-7081
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: balancer, namenode
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-7081.000.patch, HDFS-7081.001.patch, HDFS-7081.002.patch, HDFS-7081.003.patch
>
>
> Instead of loading all the policies from a client side configuration file, it may be
better to provide Mover with a new RPC call for getting all the storage policies from the
namenode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message