incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Sammer <esam...@cloudera.com>
Subject Re: [PROPOSAL] Knox Hadoop Gateway Project
Date Mon, 11 Feb 2013 19:23:07 GMT
Kevin:

Makes complete sense.

I'd like to offer to join the project, if it's accepted for incubation. I'm
a committer on MRUnit and Flume, and on the PMC for both. I've helped both
projects through the incubation phase, and I also know a little bit about
this Hadoop thing. ;)

Thanks!


On Mon, Feb 11, 2013 at 9:28 AM, Kevin Minder
<kevin.minder@hortonworks.com>wrote:

> Hi Eric,
> Let me answer your second question first.
>
> Q: Is it your intention to provide job submissions and data ingestion APIs
> for MR and HDFS, respectively?
> A: Yes we plan to progress the project to cover all existing ecosystem
> projects.  In addition the project is based on a modular framework that
> allows for each extension to cover services that are either new or
> proprietary.  Certainly there exist very high volume data ingest use cases
> for which using a gateway may be impractical but in general the idea is to
> support all required client interaction with Hadoop via the gateway.
>
> Now for your first question...
>
> Q: Can you explain a bit more about what the target use case is?
> A: One typical use case will be that the gateway will run in a DMW.  It
> will as you say be integrations with various directory services and is
> extensible to cover those not included.  The gateway will then propagate
> the identity into the Hadoop cluster using Hadoop specific mechanisms.  The
> key point is that there will typically be a single port open on the client
> side to the gateway.  The Hadoop cluster is firewalled, only providing
> access to the Hadoop services to the gateway instances.
> A: Another use case is that an organization is already using some SSO
> solution and the gateway would be integrated with that to verify any SSO
> token and then propagate the identity to the Hadoop services.
>
> I will collect this and add it to the proposal wiki once I have privs to
> create the page.
>
> Thanks!
> Kevin.
>
>
> On 2/11/13 12:03 PM, Eric Sammer wrote:
>
>> Kevin:
>>
>> Interesting proposal. Can you explain a bit more about what the target use
>> case is? It sounds like there's SSO-ish functionality (presumably a doAs()
>> machine) with integration with directory services, but the proposal also
>> mentions a single point for "data and jobs." Is it your intention to
>> provide job submissions and data ingestion APIs for MR and HDFS,
>> respectively? Do you plan to target other ecosystem projects such as
>> HBase?
>> Sorry if I missed this in the proposal.
>>
>> Thanks!
>>
>>
>> On Mon, Feb 11, 2013 at 6:55 AM, Kevin Minder
>> <kevin.minder@hortonworks.com>**wrote:
>>
>>  Knox Gateway Proposal
>>>
>>> == Abstract ==
>>>
>>> Knox Gateway is a system that provides a single point of secure access
>>> for
>>> Apache Hadoop clusters.
>>>
>>> == Proposal ==
>>>
>>> The Knox Gateway (“Gateway” or “Knox”) is a system that provides a single
>>> point of authentication and access for Apache Hadoop services in a
>>> cluster.
>>> The goal is to simplify Hadoop security for both users (i.e. who access
>>> the
>>> cluster data and execute jobs) and operators (i.e. who control access and
>>> manage the cluster). The Gateway runs as a server (or cluster of servers)
>>> that serve one or more Hadoop clusters.
>>>
>>> Provide perimeter security to make Hadoop security setup easier
>>> Support authentication and token verification security scenarios
>>> Deliver users a single cluster end-point that aggregates capabilities for
>>> data and jobs
>>> Enable integration with enterprise and cloud identity management
>>> environments
>>>
>>> == Background ==
>>>
>>> An Apache Hadoop cluster is presented to consumers as a loose collection
>>> of independent services. This makes it difficult for users to interact
>>> with
>>> Hadoop since each service maintains it’s own method of access and
>>> security.
>>> As well, for operators, configuration and administration of a secure
>>> Hadoop
>>> cluster is a complex and many Hadoop clusters are insecure as a result.
>>>
>>> == Rationale ==
>>>
>>> Organizations that are struggling with Hadoop cluster security result in
>>> a) running Hadoop without security or b) slowing adoption of Hadoop. The
>>> Gateway aims to provide perimeter security that integrates more easily
>>> into
>>> existing organizations’ security infrastructure. Doing so will simplify
>>> security for these organizations and benefit all Hadoop stakeholders
>>> (i.e.
>>> users and operators). Additionally, making a dedicated perimeter security
>>> project part of the Apache Hadoop ecosystem will prevent fragmentation in
>>> this area and further increase the value of Hadoop as a data platform.
>>>
>>> == Current Status ==
>>>
>>> Prototype available, developed by the list of initial committers.
>>>
>>> === Meritocracy ===
>>>
>>> We desire to build a diverse developer community around Gateway following
>>> the Apache Way. We want to make the project open source and will
>>> encourage
>>> contributors from multiple organizations following the Apache meritocracy
>>> model.
>>>
>>> === Community ===
>>>
>>> We hope to extend the user and developer base in the future and build a
>>> solid open source community around Gateway. Apache Hadoop has a large
>>> ecosystem of open source projects, each with a strong community of
>>> contributors. All project communities in this ecosystem have an
>>> opportunity
>>> to participate in the advancement of the Gateway project because
>>> ultimately, Gateway will enable the security capabilities of their
>>> project
>>> to be more enterprise friendly.
>>>
>>> === Core Developers ===
>>>
>>> Gateway is currently being developed by several engineers from
>>> Hortonworks
>>> - Kevin Minder, Larry McCay, John Speidel, Tom Beerbower and Sumit
>>> Mohanty.
>>> All the engineers have deep expertise in middleware, security & identity
>>> systems and are quite familiar with the Hadoop ecosystem.
>>>
>>> === Alignment ===
>>>
>>> The ASF is a natural host for Gateway given that it is already the home
>>> of
>>> Hadoop, Hive, Pig, HBase, Oozie and other emerging big data software
>>> projects. Gateway is designed to solve the security challenges familiar
>>> to
>>> the Hadoop ecosystem family of projects.
>>>
>>> == Known Risks ==
>>>
>>> === Orphaned products & Reliance on Salaried Developers ===
>>>
>>> The core developers plan to work full time on the project. We believe
>>> that
>>> this project will be of general interest to many Hadoop users and will
>>> attract a diverse set of contributors. We intend to demonstrate this by
>>> having contributors from several organizations recognized as committers
>>> by
>>> the time Knox graduates from incubation.
>>>
>>> === Inexperience with Open Source ===
>>>
>>> All of the core developers are active users and followers of open source.
>>> As well, Hortonworks has a strong heritage of success with contributions
>>> to
>>> Apache Hadoop Projects.
>>>
>>> === Homogeneous Developers ===
>>>
>>> The current core developers are from Hortonworks, however, we hope to
>>> establish a developer community that includes contributors from several
>>> corporations.
>>>
>>> === Reliance on Salaried Developers ===
>>>
>>> Currently, the developers are paid to do work on Gateway. However, once
>>> the project has a community built around it, we expect to get committers
>>> and developers from outside the current core developers.
>>>
>>> === Relationships with Other Apache Products ===
>>>
>>> Gateway is going to be used by the users and operators of Hadoop, and the
>>> Hadoop ecosystem in general.
>>>
>>> === A Excessive Fascination with the Apache Brand ===
>>>
>>> Our interest in developing Gateway in Apache project is to follow an
>>> established development model, as well since many of the Hadoop ecosystem
>>> projects also are part of Apache, Gateway will complement those projects
>>> by
>>> following the same development and contribution model.
>>>
>>> == Documentation ==
>>>
>>> There is documentation in Hortonworks’ internal repositories. These can
>>> be
>>> shared upon request and will be transferred into the Apache CM system if
>>> this proposal is accepted.
>>>
>>> == Initial Source ==
>>>
>>> The source is currently in Hortonworks’ internal repositories. The
>>> process
>>> of making this GitHub repository public has been started and the URL will
>>> be provided once available.
>>>
>>> == Source and Intellectual Property Submission Plan ==
>>>
>>> The complete Gateway code is under Apache Software License 2.
>>>
>>> == External Dependencies ==
>>>
>>> The Gateway dependencies are listed below, separated by Category A and
>>> Category B as defined in the Apache Third-Party Licensing Policy. Note:
>>> These are the direct dependencies. Indirect dependencies are not
>>> included.
>>>
>>> === Category A Dependencies ===
>>>
>>> Apache Commons - ASLv2.0
>>> commons-io:commons-io#2.4
>>> commons-cli:commons-cli#1.2
>>> commons-codec:commons-codec#1.****7
>>> org.apache.commons:commons-****digester3#3.2
>>> org.apache.commons:commons-****vfs2#2.0
>>> Apache Hadoop - ASLv2.0
>>> org.apache.hadoop:hadoop-auth#****0.23.3
>>> org.apache.hadoop:hadoop-core#****1.0.3
>>> Apache Geronimo - ASLv2.0
>>> org.apache.geronimo.****components:geronimo-jaspi#2.0.****0
>>> org.apache.geronimo.specs:****geronimo-osgi-locator#1.1
>>> Apache Shiro - ASLv2.0
>>> org.apache.shiro:shiro-web#1.****2.1
>>> ApacheDS - ASLv2.0
>>> org.apache.directory.server:****apacheds-all#1.5.5
>>>
>>> Log4J - ASLv2.0
>>> log4j:log4j#1.2.17
>>> SL4J - MIT
>>> org.slf4j:slf4j-api#1.6.6
>>> org.slf4j:slf4j-log4j12#1.6.6
>>> Guava - ASLv2.0
>>> com.google.guava:guava#14.0-****rc1
>>> HttpClient - ASLv2.0
>>> org.apache.httpcomponents:****httpclient#4.2.1
>>> Jetty - ASLv2.0
>>> org.eclipse.jetty:jetty-****server#8.1.7.v20120910
>>> org.eclipse.jetty:jetty-****servlet#8.1.7.v20120910
>>> org.eclipse.jetty:jetty-****webapp#8.1.7.v20120910
>>> org.eclipse.jetty:jetty-jaspi#****8.1.7.v20120910
>>> org.eclipse.jetty.aggregate:****jetty-all#8.1.7.v20120910
>>> org.eclipse.jetty:test-jetty-****servlet#8.1.7.v20120910
>>> Spring Security - ASLv2.0
>>> org.springframework:spring-****core#3.1.3.RELEASE
>>> org.springframework:spring-****context#3.1.3.RELEASE
>>> org.springframework:spring-****web#3.1.3.RELEASE
>>> org.springframework.security:****spring-security-core#3.1.3.****RELEASE
>>> org.springframework.security:****spring-security-web#3.1.3.****RELEASE
>>> org.springframework.security:****spring-security-config#3.1.3.**
>>> **RELEASE
>>> org.springframework.security:****spring-security-ldap#3.1.2.****RELEASE
>>> org.springframework.ldap:****spring-ldap-core#1.3.1.RELEASE
>>> org.springframework.ldap:****spring-ldap-core-tiger#1.3.1.****RELEASE
>>> org.springframework.ldap:****spring-ldap-odm#1.3.1.RELEASE
>>> org.springframework.ldap:****spring-ldap-ldif-core#1.3.1.****RELEASE
>>> org.springframework.ldap:****spring-ldap-ldif-batch#1.3.1.****RELEASE
>>> JBoss ShrinkWrap - ASLv2.0
>>> org.jboss.shrinkwrap:****shrinkwrap-api#1.0.1
>>> org.jboss.shrinkwrap:****shrinkwrap-impl-base#1.0.1
>>> org.jboss.shrinkwrap.****descriptors:shrinkwrap-**
>>> descriptors-api-javaee#2.0.0-****alpha-4
>>> org.jboss.shrinkwrap.****descriptors:shrinkwrap-**
>>> descriptors-impl-javaee#2.0.0-****alpha-4
>>>
>>>
>>> === Category A Dependencies (Test) ===
>>>
>>> EasyMock - ASLv2.0
>>> org.easymock:easymock#3.0
>>> XML Matchers - ASLv2.0
>>> org.xmlmatchers:xml-matchers#****0.10
>>>
>>> Hamcrest - BSDv3
>>> org.hamcrest:hamcrest-api#1.0
>>> org.hamcrest:hamcrest-core#1.****2.1
>>> org.hamcrest:hamcrest-library#****1.2.1
>>> JsonPath - ASLv2.0
>>> com.jayway.jsonpath:json-path#****0.8.1
>>> com.jayway.jsonpath:json-path-****assert#0.8.1
>>>
>>> XMLTool - ASLv2.0
>>> com.mycila.xmltool:xmltool#3.3
>>> REST-assured - ASLv2.0
>>> com.jayway.restassured:rest-****assured#1.6.2
>>>
>>>
>>> === Category B Dependencies ===
>>>
>>> Jersey - CDDLv1.1 or GPL2wCPE
>>> com.sun.jersey:jersey-server#****1.14
>>> com.sun.jersey:jersey-servlet#****1.14
>>> Jerico - EPLv1.0
>>> net.htmlparser.jericho:****jericho-html#3.2
>>>
>>> Servlet - CDDLv1.0 or GPLv2
>>> javax.servlet:javax.servlet-****api#3.0.1
>>>
>>> JUnit - CPLv1.0
>>> junit:junit#4.11
>>>
>>> == Cryptography ==
>>>
>>> The Gateway uses cryptographic software indirectly as a result of having
>>> two dependencies: ApacheDS and Apache Shiro. Gateway does not include any
>>> special or custom cryptographic technologies.
>>>
>>> ApacheDS is an ASF project and has been classified Export Commodity
>>> Control Number (ECCN) 5D002.C.1 due to it’s dependency on Bouncy Castle.
>>> More information on the ApacheDS classification can be found at
>>> http://svn.apache.org/repos/****asf/directory/apacheds/trunk/****<http://svn.apache.org/repos/**asf/directory/apacheds/trunk/**>
>>> installers/README<http://svn.**apache.org/repos/asf/**
>>> directory/apacheds/trunk/**installers/README<http://svn.apache.org/repos/asf/directory/apacheds/trunk/installers/README>
>>> >
>>>
>>>
>>> Apache Shiro is an ASF project and has been classified Export Commodity
>>> Control Number (ECCN) 5D002.C.1. More information on the Apache Shiro
>>> classification can be found at http://svn.apache.org/repos/**
>>> asf/shiro/trunk/README<http://**svn.apache.org/repos/asf/**
>>> shiro/trunk/README <http://svn.apache.org/repos/asf/shiro/trunk/README>>
>>>
>>>
>>> == Required Resources ==
>>>
>>> === Mailing lists ===
>>>
>>> knox-dev AT incubator DOT apache DOT org
>>> knox-commits AT incubator DOT apache DOT org
>>> knox-user AT hms incubator apache DOT org
>>> knox-private AT incubator DOT apache DOT org
>>>
>>> === Subversion Directory ===
>>>
>>> https://svn.apache.org/repos/****asf/incubator/knox<https://svn.apache.org/repos/**asf/incubator/knox>
>>> <https://**svn.apache.org/repos/asf/**incubator/knox<https://svn.apache.org/repos/asf/incubator/knox>
>>> >
>>>
>>>
>>> === Issue Tracking ===
>>>
>>> JIRA Knox (KNOX)
>>>
>>> == Initial Committers ==
>>>
>>> Kevin Minder (kevin DOT minder AT hortonworks DOT com)
>>> Larry McCay (lmccay AT hortonworks DOT com)
>>> John Speidel (jspeidel AT hortonworks DOT com)
>>> Tom Beerbower (tbeerbower AT hortonworks DOT com)
>>> Sumit Mohanty (smohanty AT hortonworks DOT com)
>>>
>>> == Affiliations ==
>>>
>>> Kevin Minder (Hortonworks)
>>> Larry McCay (Hortonworks)
>>> John Speidel (Hortonworks)
>>> Tom Beerbower (Hortonworks)
>>> Sumit Mohanty (Hortonworks)
>>>
>>> == Sponsors ==
>>>
>>> === Champion ===
>>>
>>> Devaraj Das (ddas AT apache DOT org)
>>>
>>> === Nominated Mentors ===
>>>
>>> Owen O’Malley (omalley AT apache DOT org)
>>> Mahadev Konar (mahadev AT apache DOT org)
>>> Alan Gates (gates AT apache DOT org)
>>> Devaraj Das (ddas AT apache DOT org)
>>>
>>> === Sponsoring Entity ===
>>>
>>> Incubator PMC
>>>
>>> ------------------------------****----------------------------**
>>> --**---------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.****apache.org<
>>> general-**unsubscribe@incubator.apache.**org<general-unsubscribe@incubator.apache.org>
>>> >
>>> For additional commands, e-mail: general-help@incubator.apache.****org<
>>> general-help@incubator.**apache.org <general-help@incubator.apache.org>>
>>>
>>>
>>>
>>
>
> ------------------------------**------------------------------**---------
> To unsubscribe, e-mail: general-unsubscribe@incubator.**apache.org<general-unsubscribe@incubator.apache.org>
> For additional commands, e-mail: general-help@incubator.apache.**org<general-help@incubator.apache.org>
>
>


-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message