Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9AC81107DD for ; Mon, 29 Jul 2013 22:58:56 +0000 (UTC) Received: (qmail 94244 invoked by uid 500); 29 Jul 2013 22:58:56 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 93966 invoked by uid 500); 29 Jul 2013 22:58:55 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 93957 invoked by uid 99); 29 Jul 2013 22:58:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jul 2013 22:58:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of shreepadma@cloudera.com designates 209.85.219.54 as permitted sender) Received: from [209.85.219.54] (HELO mail-oa0-f54.google.com) (209.85.219.54) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Jul 2013 22:58:52 +0000 Received: by mail-oa0-f54.google.com with SMTP id o17so5337744oag.27 for ; Mon, 29 Jul 2013 15:58:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=GfABcnB7Q+Sx+xhc0ZnpqidwrD1skAMDn/KOkolEeX4=; b=oY68cgpoj95CNK55Z5m41KQ+6ithM0ufYO9xaPtRrOHLf5osSrQuRoN61JQTQMFLXO CPYveOXjLm8qGHzusBlk4J4DyGRzl8MiWAQt004t276VaJpUBDsnCTnc95BLXwBTuF2u sNz3XxN09E7x72XY9/e6qxybMg5IkQofNTr2KWgDvw8ta2IUvsxrkQxzyPJBpIatImFl JuBKwKL/gvBo3tvpZ2djDJo9UIfVxUuwKbH0vlnpIOD3VBW/clAoxQY4Il+OSLLtM+sJ CI0veaf+REQr16MqPoC585/UBGf2w95MYCJI2jZ3tLJtdJHQHulTg5RTJ5qR4+W+qToL zNeg== MIME-Version: 1.0 X-Received: by 10.182.56.232 with SMTP id d8mr3632obq.96.1375138711433; Mon, 29 Jul 2013 15:58:31 -0700 (PDT) Received: by 10.182.189.81 with HTTP; Mon, 29 Jul 2013 15:58:31 -0700 (PDT) Date: Mon, 29 Jul 2013 15:58:31 -0700 Message-ID: Subject: [PROPOSAL] Sentry for Incubator From: Shreepadma Venugopalan To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=001a11c2c91c69829804e2ae6ecc X-Gm-Message-State: ALoCoQmqVaFHGVhB1V1+sBXYUiR1BtgVb3BuLFj2Wui6wovrce6KdzVlfnn3hg7y3Njbj912ssEq X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2c91c69829804e2ae6ecc Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Folks, We're pleased to bring a proposal to the ASF Incubator for the Sentry project. Sentry is a system for providing fine-grained role-based access control to data and metadata stored on a Hadoop cluster. The text of the proposal has been copied to the bottom of this email for convenience. The complete proposal can be found : https://wiki.apache.org/incubator/SentryProposal Thanks & Regards, Shreepadma =3D Sentry - A fine-grained Authorization System for the Hadoop ecosystem = =3D =3D=3D Abstract =3D=3D Sentry is a highly modular system for providing fine grained role based authorization to both data and metadata stored on an Apache Hadoop cluster. Sentry can be used to enforce various access policy rules when accessing data stored on Hadoop Distributed File System through various Hadoop ecosystem components such as Apache Hive, Apache Pig or others. =3D=3D Proposal =3D=3D Traditionally, user access control in Apache Hadoop has been implemented using file based permissions on HDFS. Following the UNIX permissions model, HDFS offers all or nothing semantics allowing administrator to configure system to allow certain users or user groups read, write or perform both operations on files. This system does not enable more fine grained permissions that allow access policies for logical parts within one file. Furthermore, this model can't be used to restrict access to the rich set of objects in the metadata catalog that are stored outside HDFS. Sentry will provide true role-based fine-grained user access control for Apache Hadoop and its ecosystem components such as Hive, Pig or HBase. This includes providing fine- grained role based access to both data as well as the metadata, which provides a rich object based abstraction such as databases, tables or columns. =3D=3D Background =3D=3D Sentry was initially developed by Cloudera to allow users fine grained access to data as well as the metadata in Apache Hadoop. Sentry has been maintained as an open source project on Cloudera=92s github= . Sentry was previously called =93Access=94. All code in Sentry is open sourc= e and has been made publicly available under the Apache 2 license. During this time, Sentry has been formally released two times as versions 1.0.0 and 1.1.0. =3D=3D Rationale =3D=3D Currently, users don't have a way to achieve fine grained enforceable user access control to data stored in HDFS and their associated metadata. While users can use file based permissions to control access to specific directories and files, it is insufficient because access can't be restricted to file parts i.e., to specific lines or logical columns. In the absence of such support, users have to resort to duplicating data. Furthermore, file based permissions are insufficient to provide any form of access control to the metadata that provides an object abstraction such as databases, tables, columns or partitions over the data stored in HDFS. Current Sentry developers subscribe to the mission of ASF and are familiar with the open source development process. Several members are already committers and PMC members of various other Apache projects. =3D=3D Initial Goals =3D=3D Sentry is currently in its first major release with a considerable number of enhancement requests, tasks, and issues recorded towards its future development. The initial goal of this project will be to continue to build community in the spirit of the "Apache Way", and to address the highly requested features and bug-fixes towards the next dot release. =3D=3D Current Status =3D=3D =3D=3D=3D Meritocracy =3D=3D=3D Intent of the proposal is to build a diverse community of developers around Sentry. Sentry started as a open source project on Github, driven in the spirit of open source and we would like to continue in this spirit by, for example, encouraging contributors from a variety of organizations. =3D=3D=3D Community =3D=3D=3D Sentry stakeholders desire to expand the user and developer base of Sentry further in the future. The current sets of developers in Sentry are committed to building a strong user base and open source community around the project. Development discussions within the current team have been on a public mailing [[ https://groups.google.com/a/cloudera.org/forum/#!forum/access-dev | list]]. =3D=3D=3D Core Developers =3D=3D=3D The core developers for the Sentry project are Brock Noland, Shreepadma Venugopalan, Prasad Mujumdar and Jarek Jarcec Cecho. Other contributors include Arvind Prabhakar and Xuefu Zhang. All engineers have deep expertise in Hadoop and various other ecosystem components. =3D=3D=3D Alignment =3D=3D=3D Sentry complements the access control feature of some projects in the Apache Hadoop ecosystem, such as HDFS file permissions, by providing finer grained access control to data and metadata. It supersedes the access control capabilities of some other projects such as Apache Hive by providing stronger guarantees against malicious access. Currently, Sentry integrates with Apache Hive, however we are planning to provide support for other components such as Apache Pig. While projects such as Apache Knox aim to provide perimeter security, the goal of Sentry is to implement a fine-grained role-based access control policy. Thus Sentry complements Apache Knox. =3D=3D Known Risks =3D=3D =3D=3D=3D Orphaned Products =3D=3D=3D Sentry is already deployed in production at a few well established companies and they are actively sharing feature requests. The risks of it being orphaned is negligible. =3D=3D=3D Inexperience with Open Source =3D=3D=3D All committers of the Sentry project are intimately familiar with the Apache model for open-source development and are experienced with working with various Apache open -source communities. =3D=3D=3D Homogeneous Developers =3D=3D=3D The initial set of committers includes developers from several organizations - Cloudera, Oracle, Lab41, Nvidia and Wibidata. We expect that once approved for incubation, the project will further attract new contributors. =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D It is expected that Sentry will be developed on both salaried and volunteer time, although all of the initial developers will work on it mainly on salaried time. =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D Sentry depends on other Apache Projects: Apache Hadoop, Apache Log4J, Apache Hive, Apache Shiro, multiple Apache Commons components. Build is orchestrated by Apache Maven. Sentry complements Apache Knox. =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D We would like Sentry to become an Apache project to further foster a healthy community of users and developers around it. Since Sentry solves an important problem faced by Apache Hadoop users and interacts with other components of the Apache Hadoop ecosystem, we believe that Apache is the right home for Sentry. =3D=3D Documentation =3D=3D * Cloudera provides documentation specific to its distribution of Sentry at: http://www.cloudera.com/content/cloudera-content/cloudera-docs/Sentry/Sentr= y.pdf * Sentry jira at Cloudera: https://issues.cloudera.org/browse/access =3D=3D Initial Source =3D=3D https://github.com/cloudera/access =3D=3D Source and Intellectual Property Submission Plan =3D=3D All of Sentry=92s code is under Apache 2 license already. =3D=3D External Dependencies =3D=3D All dependencies have licenses compatible with ASL. Dependencies that are not directly using ASL are, * Junit - Eclipse Public License =3D=3D Cryptography =3D=3D Sentry currently doesn=92t directly use any cryptographic libraries. Howeve= r, Sentry uses Apache Shiro, which provides support for cryptography features such as hash, cipher etc. =3D=3D Required Resources =3D=3D =3D=3D=3D Mailing Lists =3D=3D=3D * private@sentry.incubator.apache.org for private PMC discussions (with moderated subscriptions) * security@sentry.incubator.apache.org for private security related discussions * dev@sentry.incubator.apache.org * commits@sentry.incubator.apache.org * user@sentry.incubator.apache.org =3D=3D=3D Source code repository =3D=3D=3D Git repository running at http://git-wip-us.apache.org/. =3D=3D=3D Issue Tracking =3D=3D=3D JIRA Sentry (SENTRY) =3D=3D=3D Other Resources =3D=3D=3D The existing code already has unit and integration tests so we would like a Jenkins CI instance that would run the tests on reference environment. We would also like to use Jenkins to run tests for every newly submitted patch (so called pre-commit hook), however this can be added after project creation. =3D=3D Initial Committers =3D=3D * Ali Rizvi (ali.rizvi at oracle.com) * Arvind Prabhakar (arvind at apache.org) * Brock Noland (brock at apache.org) * Chaoyu Tang (ctang at cloudera.com) * Daisy Zhou (daisy at wibidata.com) * David Nalley (ke4qqq at apache.org) * Erick Tryzelaar(etryzelaar at iqt.org) * Greg Chanan (gchanan at apache.org) * Hadi Nahari (hnahari at nvidia.com) * Jarek Jarcec Cecho (jarcec at apache.org) * Johnny Zhang (xiaoyuz at cloudera.com) * Karthik Ramachandran (kramachandran at iqt.org) * Mark Grover (mgrover at cloudera.com) * Milo Polte (milo at wibidata.com) * Lenni Kuff (lskuff at cloudera.com) * Patrick Daly (daly at cloudera.com) * Patrick Hunt (phunt at apache.org) * Prasad Mujumdar (prasadm at apache.org) * Raghu Mani (raghu.mani at oracle.com) * Sean Mackrory (sean at cloudera.com) * Shreepadma Venugopalan (shreepadma at cloudera.com) * Sravya Tirukkovalur (sravya at cloudera.com) * Tom White (tomwhite at apache.org) * Xuefu Zhang (xuefu at apache.org) =3D=3D Affiliations =3D=3D * Ali Rizvi (Oracle) * Arvind Prabhakar (Cloudera) * Brock Noland (Cloudera) * Chaoyu Tang (Cloudera) * Daisy Zhou (Wibidata) * David Nalley (Citrix) * Erick Tryzelaar (Lab41) * Greg Chanan (Cloudera) * Hadi Nahari (Nvidia) * Jarek Jarcec Cecho (Cloudera) * Johnny Zhang (Cloudera) * Karthik Ramachandran (Lab41) * Mark Grover (Cloudera) * Milo Polte (Wibidata) * Lenni Kuff (Cloudera) * Patrick Daly (Cloudera) * Patrick Hunt (Cloudera) * Prasad Mujumdar (Cloudera) * Raghu Mani (Oracle) * Sean Mackrory (Cloudera) * Shreepadma Venugopalan (Cloudera) * Sravya Tirukkovalur (Cloudera) * Tom White (Cloudera) * Xuefu Zhang (Cloudera) =3D=3D Sponsors =3D=3D =3D=3D=3D Champion =3D=3D=3D * Arvind Prabhakar (Cloudera) =3D=3D=3D Nominated Mentors =3D=3D=3D * Arvind Prabhakar (Cloudera) * David Nalley (Citrix) * Patrick Hunt (Cloudera) * Tom White (Cloudera) =3D=3D=3D Sponsoring Entity =3D=3D=3D We are requesting the Incubator to sponsor this project. --001a11c2c91c69829804e2ae6ecc--