hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12357) Let NameNode to bypass external attribute provider for special user
Date Sat, 02 Sep 2017 08:35:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151448#comment-16151448
] 

Yongjun Zhang commented on HDFS-12357:
--------------------------------------

HI [~manojg],

{quote}
Having UserFilterINodeAttributeProvider seems like a cleaner approach. Is it possible to examine
the bypassUser config and skip the wrapper UserFilterINodeAttributeProvider if the user list
is empty. Most of the times, the bypass user list is going to empty and we can totally skip
the wrapper if so.
{quote}
Thanks for the good point here, sorry too many updates today I missed the above one again.

If we move the code of loading conf and checking isBypassUse to {{FSDirectory}} class  (like
done in v001), we could  skip the wrapper when the bypassUser is empty. However, even when
bypassUser is not empty, it's only one of two users, the wrapper is still created when many
other users are not in the list. Any further thought?

Hi [~chris.douglas],

Looking at the change I did in rev5 again, it saved the extra cost of {{components = Arrays.copyOfRange(components,
1, components.length);}}, but it introduced another extra cost: {{isBypassUser()}} is called
twice. One at
{code}
    if (attributeProvider != null &&
        !attributeProvider.isBypassUser()) {
{code}
The other at the trapper implementation
{code}
nodeAttrs = attributeProvider.getAttributes(components, nodeAttrs);
{code} 

after the first one is checked and found to be a non bypassUser, the second one checks again.
And this extra call happens to most users unfortunately.  Seems not easy to avoid both extra
costs with the wrapper approach.

v001 implementation does't have either of these extra costs. But certainly the wrapper class
is a better abstraction.  I can go with either approach if agreed, and we can certainly keep
improving the solution.

Thanks a lot.






> Let NameNode to bypass external attribute provider for special user
> -------------------------------------------------------------------
>
>                 Key: HDFS-12357
>                 URL: https://issues.apache.org/jira/browse/HDFS-12357
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-12357.001.patch, HDFS-12357.002.patch, HDFS-12357.003.patch,
HDFS-12357.004.patch, HDFS-12357.005.patch
>
>
> This is a third proposal to solve the problem described in HDFS-12202.
> The problem is, when we do distcp from one cluster to another (or within the same cluster),
in addition to copying file data, we copy the metadata from source to target. If external
attribute provider is enabled, the metadata may be read from the provider, thus provider data
read from source may be saved to target HDFS. 
> We want to avoid saving metadata from external provider to HDFS, so we want to bypass
external provider when doing the distcp (or hadoop fs -cp) operation.
> Two alternative approaches were proposed earlier, one in HDFS-12202, the other in HDFS-12294.
The proposal here is the third one.
> The idea is, we introduce a new config, that specifies a special user (or a list of users),
and let NN bypass external provider when the current user is a special user.
> If we run applications as the special user that need data from external attribute provider,
then it won't work. So the constraint on this approach is, the special users here should not
run applications that need data from external provider.
> Thanks [~asuresh] for proposing this idea and [~chris.douglas], [~daryn], [~manojg] for
the discussions in the other jiras. 
> I'm creating this one to discuss further.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message