hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vrushali C (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3895) Support ACLs in ATSv2
Date Fri, 26 Jan 2018 18:53:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16341427#comment-16341427

Vrushali C commented on YARN-3895:

 Hi [~jlowe]  [~jeagles]

I discussed with [~lohit] once again this morning.  Based on the scale of domain ids, I
wanted to revise the storage design. We now propose to have a domain table, the row key being
domain id and there will be two columns one for users and another for groups.  And for created
time and other things that exist in the TimelineDomain object.

So at read time, just like ATSv1 does, first get all the entities satisfying the query criteria,
then look for domain ids. And for each domain id in the response, check the domain table if
the user/group has permissions.

For wildcard of ‘*’, no check is necessary, since it means all users and groups have permissions?

Similarly if the querying user is an admin, no check is done.  Also, all this is not executed
in non-secure mode.

This will work functionally correctly but this is going to be a bit slow depending on the
number of domain ids found in the entity response set. If there is only one domain id, then
only one more get request to hbase. With each additional domain id, the query response time
will increase slightly. We can batch the gets to domain table but even so, it will be a few
seconds tending to minutes depending on number of calls needed, since multiple calls to hbase
translate to multiple hdfs calls. 

I have been scratching my head on this read performance. The only other option I see is, that
the collector keeps the domain id  & user/groups info in memory and writes it out with
each entity. That way we end up with a denormalized dataset and read queries will be as fast
as they can get with hbase. The domain table will still exist and the collector can read from
it if it happens to go down and comes back up.

Which way do you think might end up working better for applications like Tez?

Storage scalability wise, I think either of the two options would be fine with hbase.  And
the expiration / TTL can be set in either case as well. And as such, for optimizing read
/ write performance, we can pre-split the domain table and try to balance the row keys to
ensure that they go to different Region Servers so we don’t end up hot-spotting one single
RS for reads and writes of currently running applications.



> Support ACLs in ATSv2
> ---------------------
>                 Key: YARN-3895
>                 URL: https://issues.apache.org/jira/browse/YARN-3895
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>    Affects Versions: YARN-2928
>            Reporter: Varun Saxena
>            Assignee: Varun Saxena
>            Priority: Major
>              Labels: YARN-5355
> This JIRA is to keep track of authorization support design discussions for both readers
and collectors. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message