incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Incubator Wiki] Update of "SentryProposal" by ShreepadmaVenugopalan
Date Mon, 29 Jul 2013 18:57:05 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "SentryProposal" page has been changed by ShreepadmaVenugopalan:

New page:
Sentry - A fine-grained Authorization System for the Hadoop ecosystem

== Abstract ==

Sentry is a highly modular system for providing fine grained role based authorization to both
data and metadata stored on an Apache Hadoop cluster. Sentry can be used to enforce various
access policy rules when accessing data stored on Hadoop Distributed File System through various
Hadoop ecosystem components such as Apache Hive, Apache Pig or others.

== Proposal ==

Traditionally, user access control in Apache Hadoop has been implemented using file based
permissions on HDFS. Following the UNIX permissions model, HDFS offers all or nothing semantics
allowing administrator to configure system to allow certain users or user groups read, write
or perform both operations on files. This system does not enable more fine grained permissions
that allow access policies for logical parts within one file. Furthermore, this model can't
be used to restrict access to the rich set of objects in the metadata catalog that are stored
outside HDFS.

Sentry will provide true role-based fine-grained user access control for Apache Hadoop and
its ecosystem components such as Hive, Pig or HBase. This includes providing fine- grained
role based access to both data as well as the metadata, which provides a rich object based
abstraction such as databases, tables or columns.

== Background ==
Sentry was initially developed by Cloudera to allow users fine grained access to data as well
as the metadata in Apache Hadoop.

Sentry has been maintained as an open source project on Cloudera’s github. Sentry was previously
called “Access”. All code in Sentry is open source and has been made publicly available
under the Apache 2 license. During this time, Sentry has been formally released two times
as versions 1.0.0 and 1.1.0.

== Rationale ==

Currently, users don't have a way to achieve fine grained enforceable user access control
to data stored in HDFS and their associated metadata. While users can use file based permissions
to control access to specific directories and files, it is insufficient because access can't
be restricted to file parts i.e., to specific lines or logical columns. In the absence of
such support, users have to resort to duplicating data. Furthermore, file based permissions
are insufficient to provide any form of access control to the metadata that provides an object
abstraction such as databases, tables, columns or partitions over the data stored in HDFS.

It is important to note that projects such as Apache Knox aim to provide perimeter security,
whereas the goal of Sentry is to implement a fine-grained role-based access control policy.
Hence Sentry complements Apache Knox.

Current Sentry developers subscribe to the mission of ASF and are familiar with the open source
development process. Several members are already committers and PMC members of various other
Apache projects.

== Initial Goals ==
Sentry is currently in its first major release with a considerable number of enhancement requests,
tasks, and issues recorded towards its future development. The initial goal of this project
will be to continue to build community in the spirit of the "Apache Way", and to address the
highly requested features and bug-fixes towards the next dot release.

== Current Status ==
=== Meritocracy ===
Intent of the proposal is to build a diverse community of developers around Sentry. Sentry
started as a open source project on Github, driven in the spirit of open source and we would
like to continue in this spirit by, for example, encouraging contributors from a variety of

=== Community ===
Sentry stakeholders desire to expand the user and developer base of Sentry further in the
future. The current sets of developers in Sentry are committed to building a strong user base
and open source community around the project. All development discussions within the current
team have been on a public mailing list (

=== Core Developers ===

The core developers for the Sentry project are Brock Noland, Shreepadma Venugopalan, Prasad
Mujumdar and  Jarek Jarcec Cecho. Other contributors include Arvind Prabhakar and Xuefu Zhang.
All engineers have deep expertise in Hadoop and various other ecosystem components.

=== Alignment ===

Sentry complements some aspects of other projects in the Apache Hadoop ecosystem, such as
HDFS file permissions, by providing fine grained access control to data and metadata in Hadoop.
Currently, it integrates with Apache Hive, however we are planning to provide support for
other components such as Apache Pig.

== Known Risks ==

=== Orphaned Products ===

Sentry is already deployed in production at a few well established companies and they are
actively sharing feature requests. The risks of it being orphaned is negligible.

=== Inexperience with Open Source ===

All committers of  the Sentry project are intimately familiar with the Apache model for open-source
development and are experienced with working with various Apache open -source communities.
=== Homogeneous Developers ===

The initial set of committers includes developers from several organizations - Cloudera, Oracle,
Lab41, Nvidia and Wibidata.  We expect that once approved for incubation, the project will
further attract new contributors.

=== Reliance on Salaried Developers ===

It is expected that Sentry will be developed on both salaried and volunteer time, although
all of the initial developers will work on it mainly on salaried time.

=== Relationships with Other Apache Products ===

Sentry depends on other Apache Projects: Apache Hadoop, Apache Log4J, Apache Hive, Apache
Shiro, multiple Apache Commons components. Build is orchestrated by Apache Maven. Sentry complements
Apache Knox.

=== An Excessive Fascination with the Apache Brand ===

We would like Sentry to become an Apache project to further foster a healthy community of
users and developers around it. Since Sentry solves an important problem faced by Apache Hadoop
users and interacts with other components of the Apache Hadoop ecosystem, we believe that
Apache is the right home for Sentry.

== Documentation ==

== Initial Source ==

== Source and Intellectual Property Submission Plan ==
All of Sentry’s code is under Apache 2 license already.

== External Dependencies ==

All dependencies have licenses compatible with ASL. Dependencies that are not directly using
ASL are,

Junit - Eclipse Public License

== Cryptography ==

Sentry currently doesn’t directly use any cryptographic libraries.

== Required Resources ==

=== Mailing Lists ===

  * for private PMC discussions (with moderated subscriptions)
  * for private security related discussions

=== Source code repository ===

Git repository running at

=== Issue Tracking ===


=== Other Resources ===

The existing code already has unit and integration tests so we would like a Jenkins CI instance
that would run the tests on reference environment. We would also like to use Jenkins to run
tests for every newly submitted patch (so called pre-commit hook), however this can be added
after project creation.

== Initial Committers ==

  * Ali Rizvi (
  * Arvind Prabhakar (
  * Brock Noland  (
  * Chaoyu Tang (
  * Daisy Zhao (
  * David Nalley (
  * Erick Tryzelaar(
  * Greg Chanan (
  * Hadi Nahari (
  * Jarek Jarcec Cecho (
  * Johnny Zhang (
  * Karthik Ramachandran (
  * Mark Grover (
  * Milo Polte (
  * Lenni Kuff  (
  * Patrick Daly (
  * Patrick Hunt (
  * Prasad Mujumdar (
  * Raghu Mani ( 
  * Sean Mackrory (
  * Shreepadma Venugopalan (
  * Sravya Tirukkovalur (
  * Tom White (
  * Xuefu Zhang (

== Affiliations ==
  * Ali Rizvi (Oracle)
  * Arvind Prabhakar (Cloudera)
  * Brock Noland  (Cloudera)
  * Chaoyu Tang (Cloudera)
  * Daisy Zhao (Wibidata)
  * David Nalley (Citrix)
  * Erick Tryzelaar (Lab41)
  * Greg Chanan (Cloudera)
  * Hadi Nahari (Nvidia)
  * Jarek Jarcec Cecho (Cloudera)
  * Johnny Zhang (Cloudera)
  * Karthik Ramachandran (Lab41)
  * Mark Grover (Cloudera)
  * Milo Polte (Wibidata)
  * Lenni Kuff  (Cloudera)
  * Patrick Daly (Cloudera)
  * Patrick Hunt (Cloudera)
  * Prasad Mujumdar (Cloudera)
  * Raghu Mani (Oracle)
  * Sean Mackrory (Cloudera)
  * Shreepadma Venugopalan (Cloudera)
  * Sravya Tirukkovalur (Cloudera)
  * Tom White (Cloudera)
  * Xuefu Zhang (Cloudera)

== Sponsors ==

=== Champion ===

  * Arvind Prabhakar (Cloudera)

=== Nominated Mentors ===

  * Arvind Prabhakar (Cloudera)
  * David Nalley (Citrix)
  * Patrick Hunt (Cloudera)
  * Tom White (Cloudera)

=== Sponsoring Entity ===

We are requesting the Incubator to sponsor this project.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message