Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2F290E35C for ; Mon, 11 Feb 2013 19:23:36 +0000 (UTC) Received: (qmail 54758 invoked by uid 500); 11 Feb 2013 19:23:35 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 54592 invoked by uid 500); 11 Feb 2013 19:23:35 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 54584 invoked by uid 99); 11 Feb 2013 19:23:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2013 19:23:35 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of esammer@cloudera.com designates 209.85.219.43 as permitted sender) Received: from [209.85.219.43] (HELO mail-oa0-f43.google.com) (209.85.219.43) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Feb 2013 19:23:28 +0000 Received: by mail-oa0-f43.google.com with SMTP id l10so6621969oag.16 for ; Mon, 11 Feb 2013 11:23:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=ETNjD9lSggk45BWYSsozJFKMRFRBkhf5ZPsF7DXjfFY=; b=AZT+ia8xlXHkSPGenIspHh2/k4+zuJFAryPKXia0bGC46O0x73Y8XbJUEzhDOKIWqK zVDT2bF35ffKy1XvkkRgKVnedMsTjJeDDSKNWQ1QV0sk91KceNO2WGA2gSHT+g/iyubM 44faf8MehUhjV0OwP53PghuCN/AFPmVDuv/+7M28XgBnnsnidMK9+lDwUcbA1sZKSHiC mPWkDZUwMr3u+9A9VybKyC48OhKkHMcYdAGG8enC/mWZ6l6o6ON+rxOxCnrA65JCCPa6 DDnAZCqMQfcQ/1JMjLdq8R7wALvOHptgncH1iTpbO/9rNxz0cAIuLpetG7UvCAY3cfCf jicg== MIME-Version: 1.0 X-Received: by 10.60.1.129 with SMTP id 1mr11246384oem.93.1360610587135; Mon, 11 Feb 2013 11:23:07 -0800 (PST) Received: by 10.60.136.161 with HTTP; Mon, 11 Feb 2013 11:23:07 -0800 (PST) In-Reply-To: <51192A4B.8050706@hortonworks.com> References: <5119065F.1040109@hortonworks.com> <51192A4B.8050706@hortonworks.com> Date: Mon, 11 Feb 2013 11:23:07 -0800 Message-ID: Subject: Re: [PROPOSAL] Knox Hadoop Gateway Project From: Eric Sammer To: "general@incubator.apache.org" Content-Type: multipart/alternative; boundary=e89a8fb1f4f4b94d6004d577d646 X-Gm-Message-State: ALoCoQmoIid7XZ+qk+fALGbJccyPMACz6r40EsntjQaHW6yp7D58R2SVuiD1EAH9KJ5OuCNncLJV X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb1f4f4b94d6004d577d646 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Kevin: Makes complete sense. I'd like to offer to join the project, if it's accepted for incubation. I'm a committer on MRUnit and Flume, and on the PMC for both. I've helped both projects through the incubation phase, and I also know a little bit about this Hadoop thing. ;) Thanks! On Mon, Feb 11, 2013 at 9:28 AM, Kevin Minder wrote: > Hi Eric, > Let me answer your second question first. > > Q: Is it your intention to provide job submissions and data ingestion API= s > for MR and HDFS, respectively? > A: Yes we plan to progress the project to cover all existing ecosystem > projects. In addition the project is based on a modular framework that > allows for each extension to cover services that are either new or > proprietary. Certainly there exist very high volume data ingest use case= s > for which using a gateway may be impractical but in general the idea is t= o > support all required client interaction with Hadoop via the gateway. > > Now for your first question... > > Q: Can you explain a bit more about what the target use case is? > A: One typical use case will be that the gateway will run in a DMW. It > will as you say be integrations with various directory services and is > extensible to cover those not included. The gateway will then propagate > the identity into the Hadoop cluster using Hadoop specific mechanisms. T= he > key point is that there will typically be a single port open on the clien= t > side to the gateway. The Hadoop cluster is firewalled, only providing > access to the Hadoop services to the gateway instances. > A: Another use case is that an organization is already using some SSO > solution and the gateway would be integrated with that to verify any SSO > token and then propagate the identity to the Hadoop services. > > I will collect this and add it to the proposal wiki once I have privs to > create the page. > > Thanks! > Kevin. > > > On 2/11/13 12:03 PM, Eric Sammer wrote: > >> Kevin: >> >> Interesting proposal. Can you explain a bit more about what the target u= se >> case is? It sounds like there's SSO-ish functionality (presumably a doAs= () >> machine) with integration with directory services, but the proposal also >> mentions a single point for "data and jobs." Is it your intention to >> provide job submissions and data ingestion APIs for MR and HDFS, >> respectively? Do you plan to target other ecosystem projects such as >> HBase? >> Sorry if I missed this in the proposal. >> >> Thanks! >> >> >> On Mon, Feb 11, 2013 at 6:55 AM, Kevin Minder >> **wrote: >> >> Knox Gateway Proposal >>> >>> =3D=3D Abstract =3D=3D >>> >>> Knox Gateway is a system that provides a single point of secure access >>> for >>> Apache Hadoop clusters. >>> >>> =3D=3D Proposal =3D=3D >>> >>> The Knox Gateway (=93Gateway=94 or =93Knox=94) is a system that provide= s a single >>> point of authentication and access for Apache Hadoop services in a >>> cluster. >>> The goal is to simplify Hadoop security for both users (i.e. who access >>> the >>> cluster data and execute jobs) and operators (i.e. who control access a= nd >>> manage the cluster). The Gateway runs as a server (or cluster of server= s) >>> that serve one or more Hadoop clusters. >>> >>> Provide perimeter security to make Hadoop security setup easier >>> Support authentication and token verification security scenarios >>> Deliver users a single cluster end-point that aggregates capabilities f= or >>> data and jobs >>> Enable integration with enterprise and cloud identity management >>> environments >>> >>> =3D=3D Background =3D=3D >>> >>> An Apache Hadoop cluster is presented to consumers as a loose collectio= n >>> of independent services. This makes it difficult for users to interact >>> with >>> Hadoop since each service maintains it=92s own method of access and >>> security. >>> As well, for operators, configuration and administration of a secure >>> Hadoop >>> cluster is a complex and many Hadoop clusters are insecure as a result. >>> >>> =3D=3D Rationale =3D=3D >>> >>> Organizations that are struggling with Hadoop cluster security result i= n >>> a) running Hadoop without security or b) slowing adoption of Hadoop. Th= e >>> Gateway aims to provide perimeter security that integrates more easily >>> into >>> existing organizations=92 security infrastructure. Doing so will simpli= fy >>> security for these organizations and benefit all Hadoop stakeholders >>> (i.e. >>> users and operators). Additionally, making a dedicated perimeter securi= ty >>> project part of the Apache Hadoop ecosystem will prevent fragmentation = in >>> this area and further increase the value of Hadoop as a data platform. >>> >>> =3D=3D Current Status =3D=3D >>> >>> Prototype available, developed by the list of initial committers. >>> >>> =3D=3D=3D Meritocracy =3D=3D=3D >>> >>> We desire to build a diverse developer community around Gateway followi= ng >>> the Apache Way. We want to make the project open source and will >>> encourage >>> contributors from multiple organizations following the Apache meritocra= cy >>> model. >>> >>> =3D=3D=3D Community =3D=3D=3D >>> >>> We hope to extend the user and developer base in the future and build a >>> solid open source community around Gateway. Apache Hadoop has a large >>> ecosystem of open source projects, each with a strong community of >>> contributors. All project communities in this ecosystem have an >>> opportunity >>> to participate in the advancement of the Gateway project because >>> ultimately, Gateway will enable the security capabilities of their >>> project >>> to be more enterprise friendly. >>> >>> =3D=3D=3D Core Developers =3D=3D=3D >>> >>> Gateway is currently being developed by several engineers from >>> Hortonworks >>> - Kevin Minder, Larry McCay, John Speidel, Tom Beerbower and Sumit >>> Mohanty. >>> All the engineers have deep expertise in middleware, security & identit= y >>> systems and are quite familiar with the Hadoop ecosystem. >>> >>> =3D=3D=3D Alignment =3D=3D=3D >>> >>> The ASF is a natural host for Gateway given that it is already the home >>> of >>> Hadoop, Hive, Pig, HBase, Oozie and other emerging big data software >>> projects. Gateway is designed to solve the security challenges familiar >>> to >>> the Hadoop ecosystem family of projects. >>> >>> =3D=3D Known Risks =3D=3D >>> >>> =3D=3D=3D Orphaned products & Reliance on Salaried Developers =3D=3D=3D >>> >>> The core developers plan to work full time on the project. We believe >>> that >>> this project will be of general interest to many Hadoop users and will >>> attract a diverse set of contributors. We intend to demonstrate this by >>> having contributors from several organizations recognized as committers >>> by >>> the time Knox graduates from incubation. >>> >>> =3D=3D=3D Inexperience with Open Source =3D=3D=3D >>> >>> All of the core developers are active users and followers of open sourc= e. >>> As well, Hortonworks has a strong heritage of success with contribution= s >>> to >>> Apache Hadoop Projects. >>> >>> =3D=3D=3D Homogeneous Developers =3D=3D=3D >>> >>> The current core developers are from Hortonworks, however, we hope to >>> establish a developer community that includes contributors from several >>> corporations. >>> >>> =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D >>> >>> Currently, the developers are paid to do work on Gateway. However, once >>> the project has a community built around it, we expect to get committer= s >>> and developers from outside the current core developers. >>> >>> =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D >>> >>> Gateway is going to be used by the users and operators of Hadoop, and t= he >>> Hadoop ecosystem in general. >>> >>> =3D=3D=3D A Excessive Fascination with the Apache Brand =3D=3D=3D >>> >>> Our interest in developing Gateway in Apache project is to follow an >>> established development model, as well since many of the Hadoop ecosyst= em >>> projects also are part of Apache, Gateway will complement those project= s >>> by >>> following the same development and contribution model. >>> >>> =3D=3D Documentation =3D=3D >>> >>> There is documentation in Hortonworks=92 internal repositories. These c= an >>> be >>> shared upon request and will be transferred into the Apache CM system i= f >>> this proposal is accepted. >>> >>> =3D=3D Initial Source =3D=3D >>> >>> The source is currently in Hortonworks=92 internal repositories. The >>> process >>> of making this GitHub repository public has been started and the URL wi= ll >>> be provided once available. >>> >>> =3D=3D Source and Intellectual Property Submission Plan =3D=3D >>> >>> The complete Gateway code is under Apache Software License 2. >>> >>> =3D=3D External Dependencies =3D=3D >>> >>> The Gateway dependencies are listed below, separated by Category A and >>> Category B as defined in the Apache Third-Party Licensing Policy. Note: >>> These are the direct dependencies. Indirect dependencies are not >>> included. >>> >>> =3D=3D=3D Category A Dependencies =3D=3D=3D >>> >>> Apache Commons - ASLv2.0 >>> commons-io:commons-io#2.4 >>> commons-cli:commons-cli#1.2 >>> commons-codec:commons-codec#1.****7 >>> org.apache.commons:commons-****digester3#3.2 >>> org.apache.commons:commons-****vfs2#2.0 >>> Apache Hadoop - ASLv2.0 >>> org.apache.hadoop:hadoop-auth#****0.23.3 >>> org.apache.hadoop:hadoop-core#****1.0.3 >>> Apache Geronimo - ASLv2.0 >>> org.apache.geronimo.****components:geronimo-jaspi#2.0.****0 >>> org.apache.geronimo.specs:****geronimo-osgi-locator#1.1 >>> Apache Shiro - ASLv2.0 >>> org.apache.shiro:shiro-web#1.****2.1 >>> ApacheDS - ASLv2.0 >>> org.apache.directory.server:****apacheds-all#1.5.5 >>> >>> Log4J - ASLv2.0 >>> log4j:log4j#1.2.17 >>> SL4J - MIT >>> org.slf4j:slf4j-api#1.6.6 >>> org.slf4j:slf4j-log4j12#1.6.6 >>> Guava - ASLv2.0 >>> com.google.guava:guava#14.0-****rc1 >>> HttpClient - ASLv2.0 >>> org.apache.httpcomponents:****httpclient#4.2.1 >>> Jetty - ASLv2.0 >>> org.eclipse.jetty:jetty-****server#8.1.7.v20120910 >>> org.eclipse.jetty:jetty-****servlet#8.1.7.v20120910 >>> org.eclipse.jetty:jetty-****webapp#8.1.7.v20120910 >>> org.eclipse.jetty:jetty-jaspi#****8.1.7.v20120910 >>> org.eclipse.jetty.aggregate:****jetty-all#8.1.7.v20120910 >>> org.eclipse.jetty:test-jetty-****servlet#8.1.7.v20120910 >>> Spring Security - ASLv2.0 >>> org.springframework:spring-****core#3.1.3.RELEASE >>> org.springframework:spring-****context#3.1.3.RELEASE >>> org.springframework:spring-****web#3.1.3.RELEASE >>> org.springframework.security:****spring-security-core#3.1.3.****RELEASE >>> org.springframework.security:****spring-security-web#3.1.3.****RELEASE >>> org.springframework.security:****spring-security-config#3.1.3.** >>> **RELEASE >>> org.springframework.security:****spring-security-ldap#3.1.2.****RELEASE >>> org.springframework.ldap:****spring-ldap-core#1.3.1.RELEASE >>> org.springframework.ldap:****spring-ldap-core-tiger#1.3.1.****RELEASE >>> org.springframework.ldap:****spring-ldap-odm#1.3.1.RELEASE >>> org.springframework.ldap:****spring-ldap-ldif-core#1.3.1.****RELEASE >>> org.springframework.ldap:****spring-ldap-ldif-batch#1.3.1.****RELEASE >>> JBoss ShrinkWrap - ASLv2.0 >>> org.jboss.shrinkwrap:****shrinkwrap-api#1.0.1 >>> org.jboss.shrinkwrap:****shrinkwrap-impl-base#1.0.1 >>> org.jboss.shrinkwrap.****descriptors:shrinkwrap-** >>> descriptors-api-javaee#2.0.0-****alpha-4 >>> org.jboss.shrinkwrap.****descriptors:shrinkwrap-** >>> descriptors-impl-javaee#2.0.0-****alpha-4 >>> >>> >>> =3D=3D=3D Category A Dependencies (Test) =3D=3D=3D >>> >>> EasyMock - ASLv2.0 >>> org.easymock:easymock#3.0 >>> XML Matchers - ASLv2.0 >>> org.xmlmatchers:xml-matchers#****0.10 >>> >>> Hamcrest - BSDv3 >>> org.hamcrest:hamcrest-api#1.0 >>> org.hamcrest:hamcrest-core#1.****2.1 >>> org.hamcrest:hamcrest-library#****1.2.1 >>> JsonPath - ASLv2.0 >>> com.jayway.jsonpath:json-path#****0.8.1 >>> com.jayway.jsonpath:json-path-****assert#0.8.1 >>> >>> XMLTool - ASLv2.0 >>> com.mycila.xmltool:xmltool#3.3 >>> REST-assured - ASLv2.0 >>> com.jayway.restassured:rest-****assured#1.6.2 >>> >>> >>> =3D=3D=3D Category B Dependencies =3D=3D=3D >>> >>> Jersey - CDDLv1.1 or GPL2wCPE >>> com.sun.jersey:jersey-server#****1.14 >>> com.sun.jersey:jersey-servlet#****1.14 >>> Jerico - EPLv1.0 >>> net.htmlparser.jericho:****jericho-html#3.2 >>> >>> Servlet - CDDLv1.0 or GPLv2 >>> javax.servlet:javax.servlet-****api#3.0.1 >>> >>> JUnit - CPLv1.0 >>> junit:junit#4.11 >>> >>> =3D=3D Cryptography =3D=3D >>> >>> The Gateway uses cryptographic software indirectly as a result of havin= g >>> two dependencies: ApacheDS and Apache Shiro. Gateway does not include a= ny >>> special or custom cryptographic technologies. >>> >>> ApacheDS is an ASF project and has been classified Export Commodity >>> Control Number (ECCN) 5D002.C.1 due to it=92s dependency on Bouncy Cast= le. >>> More information on the ApacheDS classification can be found at >>> http://svn.apache.org/repos/****asf/directory/apacheds/trunk/**** >>> installers/README>> directory/apacheds/trunk/**installers/README >>> > >>> >>> >>> Apache Shiro is an ASF project and has been classified Export Commodity >>> Control Number (ECCN) 5D002.C.1. More information on the Apache Shiro >>> classification can be found at http://svn.apache.org/repos/** >>> asf/shiro/trunk/README>> shiro/trunk/README = > >>> >>> >>> =3D=3D Required Resources =3D=3D >>> >>> =3D=3D=3D Mailing lists =3D=3D=3D >>> >>> knox-dev AT incubator DOT apache DOT org >>> knox-commits AT incubator DOT apache DOT org >>> knox-user AT hms incubator apache DOT org >>> knox-private AT incubator DOT apache DOT org >>> >>> =3D=3D=3D Subversion Directory =3D=3D=3D >>> >>> https://svn.apache.org/repos/****asf/incubator/knox >>> >>> > >>> >>> >>> =3D=3D=3D Issue Tracking =3D=3D=3D >>> >>> JIRA Knox (KNOX) >>> >>> =3D=3D Initial Committers =3D=3D >>> >>> Kevin Minder (kevin DOT minder AT hortonworks DOT com) >>> Larry McCay (lmccay AT hortonworks DOT com) >>> John Speidel (jspeidel AT hortonworks DOT com) >>> Tom Beerbower (tbeerbower AT hortonworks DOT com) >>> Sumit Mohanty (smohanty AT hortonworks DOT com) >>> >>> =3D=3D Affiliations =3D=3D >>> >>> Kevin Minder (Hortonworks) >>> Larry McCay (Hortonworks) >>> John Speidel (Hortonworks) >>> Tom Beerbower (Hortonworks) >>> Sumit Mohanty (Hortonworks) >>> >>> =3D=3D Sponsors =3D=3D >>> >>> =3D=3D=3D Champion =3D=3D=3D >>> >>> Devaraj Das (ddas AT apache DOT org) >>> >>> =3D=3D=3D Nominated Mentors =3D=3D=3D >>> >>> Owen O=92Malley (omalley AT apache DOT org) >>> Mahadev Konar (mahadev AT apache DOT org) >>> Alan Gates (gates AT apache DOT org) >>> Devaraj Das (ddas AT apache DOT org) >>> >>> =3D=3D=3D Sponsoring Entity =3D=3D=3D >>> >>> Incubator PMC >>> >>> ------------------------------****----------------------------** >>> --**--------- >>> To unsubscribe, e-mail: general-unsubscribe@incubator.****apache.org< >>> general-**unsubscribe@incubator.apache.**org >>> > >>> For additional commands, e-mail: general-help@incubator.apache.****org< >>> general-help@incubator.**apache.org = > >>> >>> >>> >> > > ------------------------------**------------------------------**--------- > To unsubscribe, e-mail: general-unsubscribe@incubator.**apache.org > For additional commands, e-mail: general-help@incubator.apache.**org > > --=20 Eric Sammer twitter: esammer data: www.cloudera.com --e89a8fb1f4f4b94d6004d577d646--