Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 64F1979A1 for ; Fri, 2 Sep 2011 21:40:05 +0000 (UTC) Received: (qmail 18660 invoked by uid 500); 2 Sep 2011 21:40:04 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 18309 invoked by uid 500); 2 Sep 2011 21:40:03 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 18298 invoked by uid 99); 2 Sep 2011 21:40:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 21:40:03 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bimargulies@gmail.com designates 209.85.213.47 as permitted sender) Received: from [209.85.213.47] (HELO mail-yw0-f47.google.com) (209.85.213.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 21:39:58 +0000 Received: by ywa12 with SMTP id 12so2335829ywa.6 for ; Fri, 02 Sep 2011 14:39:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=mxIEKz6av4q/B0SqA8xtg2Ts+JP9Rd/zfgPhWV7tjlU=; b=Dgmgg/5himtWpz6RdQmlT1H/2IyxfPayGif15OjJl0tK8AnBGU2sBXDDFQtri6j16/ k3xaDoAYCZT9IrUNQ9QlAq3I5d1eWZqdntFaTAy0D+uzF4WpXeEVcp7oavvtIz3Vjgmf FDZH0S6//lEC3oQuErwfQ3W6FyLccqRb0/i1k= MIME-Version: 1.0 Received: by 10.150.11.7 with SMTP id 7mr959221ybk.433.1314999577874; Fri, 02 Sep 2011 14:39:37 -0700 (PDT) Received: by 10.151.15.3 with HTTP; Fri, 2 Sep 2011 14:39:37 -0700 (PDT) In-Reply-To: References: <1671472140.79689.1314977914651.JavaMail.root@linzimmb04o.imo.intelink.gov> <1545550017.79741.1314978315836.JavaMail.root@linzimmb04o.imo.intelink.gov> Date: Fri, 2 Sep 2011 17:39:37 -0400 Message-ID: Subject: Re: [PROPOSAL] Accumulo for the Apache Incubator From: Benson Margulies To: general@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable No votes yet, please, except as an informal expression of (un)enthusiasm. Owen, you raise two question. On the subject of grants, please read the IP description in the proposal again. You can't 'grant' rights to something that neither you nor anyone else owns. The proposal offers both a preferred alternative and a backstop. On the subject of LGPL, I'll leave it to the authors to answer. On Fri, Sep 2, 2011 at 5:17 PM, Todd Lipcon wrote: > Non-binding +1. Regarding Owen's concern over licenses, if I recall > correctly, those concerns would block graduation from the incubator, > but not acceptance to it. > > I am also interested in being added as a committer to this proposal. > As an HBase committer (but not speaking for the project as a whole) I > think having cross-pollination between the codebases will be > beneficial to everyone, so I'd like to be involved. > > Thanks > -Todd > > On Fri, Sep 2, 2011 at 8:45 AM, Billie J Rinaldi > wrote: >> Greetings, >> >> I would like to propose Accumulo to be an Apache Incubator project. =C2= =A0Accumulo is a distributed key/value store that provides expressive cell-= level access labels and a server-side programming mechanism that can modify= key/value pairs at various points in the data management process. =C2=A0It= is based on Google's BigTable design and runs over Apache Hadoop and Zooke= eper. >> >> Here is a link to the proposal in the Incubator wiki: >> http://wiki.apache.org/incubator/AccumuloProposal >> >> I've also pasted the initial contents below. >> >> Thanks, >> Billie Rinaldi >> >> >> =3D Accumulo Proposal =3D >> >> =3D=3D Abstract =3D=3D >> Accumulo is a distributed key/value store that provides expressive, cell= -level access labels. >> >> =3D=3D Proposal =3D=3D >> Accumulo is a sorted, distributed key/value store based on Google's BigT= able design. =C2=A0It is built on top of Apache Hadoop, Zookeeper, and Thri= ft. =C2=A0It features a few novel improvements on the BigTable design in th= e form of cell-level access labels and a server-side programming mechanism = that can modify key/value pairs at various points in the data management pr= ocess. >> >> =3D=3D Background =3D=3D >> Google published the design of BigTable in 2006. =C2=A0Several other ope= n source projects have implemented aspects of this design including HBase, = CloudStore, and Cassandra. =C2=A0Accumulo began its development in 2008. >> >> =3D=3D Rationale =3D=3D >> There is a need for a flexible, high performance distributed key/value s= tore that provides expressive, fine-grained access labels. =C2=A0The commun= ities we expect to be most interested in such a project are government, hea= lth care, and other industries where privacy is a concern. =C2=A0We have ma= de much progress in developing this project over the past 3 years and belie= ve both the project and the interested communities would benefit from this = work being openly available and having open development. >> >> =3D=3D Current Status =3D=3D >> >> =3D=3D=3D Meritocracy =3D=3D=3D >> We intend to strongly encourage the community to help with and contribut= e to the code. =C2=A0We will actively seek potential committers and help th= em become familiar with the codebase. >> >> =3D=3D=3D Community =3D=3D=3D >> A strong government community has developed around Accumulo and training= classes have been ongoing for about a year. =C2=A0Hundreds of developers u= se Accumulo. >> >> =3D=3D=3D Core Developers =3D=3D=3D >> The developers are mainly employed by the National Security Agency, but = we anticipate interest developing among other companies. >> >> =3D=3D=3D Alignment =3D=3D=3D >> Accumulo is built on top of Hadoop, Zookeeper, and Thrift. =C2=A0It buil= ds with Maven. =C2=A0Due to the strong relationship with these Apache proje= cts, the incubator is a good match for Accumulo. >> >> =3D=3D Known Risks =3D=3D >> =3D=3D=3D Orphaned Products =3D=3D=3D >> There is only a small risk of being orphaned. =C2=A0The community is com= mitted to improving the codebase of the project due to its fulfilling needs= not addressed by any other software. >> >> =3D=3D=3D Inexperience with Open Source =3D=3D=3D >> The codebase has been treated internally as an open source project since= its beginning, and the initial Apache committers have been involved with t= he code for multiple years. =C2=A0While our experience with public open sou= rce is limited, we do not anticipate difficulty in operating under Apache's= development process. >> >> =3D=3D=3D Homogeneous Developers =3D=3D=3D >> The committers have multiple employers and it is expected that committer= s from different companies will be recruited. >> >> =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D >> The initial committers are all paid by their employers to work on Accumu= lo and we expect such employment to continue. =C2=A0Some of the initial com= mitters would continue as volunteers even if no longer employed to do so. >> >> =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D >> Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, -ne= t, -io, -jci, -collections, -configuration, -logging, and -codec. >> >> =3D=3D=3D Relationship to HBase =3D=3D=3D >> Accumulo and HBase are both based on the design of Google's BigTable, so= there is a danger that potential users will have difficulty distinguishing= the two or that they will not see an incentive in adopting Accumulo. =C2= =A0There are a few key areas in which Accumulo differs from HBase. =C2=A0So= me of the desired features of Accumulo could be incorporated into HBase, ho= wever the most important of these may be unlikely to be adopted (see cell-l= evel access labels and iterators below). =C2=A0It is a possibility that the= codebases will ultimately converge, but the number of differences at the c= urrent time warrants a separate project for Accumulo. >> >> =3D=3D=3D=3D Access Labels =3D=3D=3D=3D >> Accumulo has an additional portion of its key that sorts after the colum= n qualifier and before the timestamp. =C2=A0It is called column visibility = and enables expressive cell-level access control. =C2=A0Authorizations are = passed with each query to control what data is returned to the user. =C2=A0= The column visibilities are boolean AND and OR combinations of arbitrary st= rings (such as "(A&B)|C") and authorizations are sets of strings (such as {= C,D}). >> >> =3D=3D=3D=3D Iterators =3D=3D=3D=3D >> Accumulo has a novel server-side programming mechanism that can modify t= he data written to disk or returned to the user. =C2=A0This mechanism can b= e configured for any of the scopes where data is read from or written to di= sk. =C2=A0It can be used to perform joins on data within a single tablet. >> >> =3D=3D=3D=3D Flexibility =3D=3D=3D=3D >> HBase requires the user to specify the set of column families to be used= up front. =C2=A0Accumulo places no restrictions on the column families. = =C2=A0Also, each column family in HBase is stored separately on disk. =C2= =A0Accumulo allows column families to be grouped together on disk, as does = BigTable. =C2=A0This enables users to configure how their data is stored, p= otentially providing improvements in compression and lookup speeds. =C2=A0I= t gives Accumulo a row/column hybrid nature, while HBase is currently colum= n-oriented. >> >> =3D=3D=3D=3D Testing =3D=3D=3D=3D >> Accumulo has testing frameworks that have resulted in its achieving a hi= gh level of correctness and performance. =C2=A0We have observed that under = some configurations and conditions Accumulo will outperform HBase and provi= de greater data integrity. >> >> =3D=3D=3D=3D Logging =3D=3D=3D=3D >> HBase uses a write-ahead log on the Hadoop Distributed File System. =C2= =A0Accumulo has its own logging service that does not depend on communicati= on with the HDFS NameNode. >> >> =3D=3D=3D=3D Storage =3D=3D=3D=3D >> Accumulo has a relative key file format that improves compression. >> >> =3D=3D=3D=3D Areas in which HBase features improvements over Accumulo = =3D=3D=3D=3D >> in memory tables, upserts, coprocessors, connections to other projects s= uch as Cascading and Pig >> >> =3D=3D=3D Expectations =3D=3D=3D >> There is a risk that Accumulo will be criticized for not providing adequ= ate security. =C2=A0The access labels in Accumulo do not in themselves prov= ide a complete security solution, but are a mechanism for labeling each pie= ce of data with the authorizations that are necessary to see it. >> >> =3D=3D=3D Apache Brand =3D=3D=3D >> Our interest in releasing this code as an Apache incubator project is du= e to its strong relationship with other Apache projects, i.e. Hadoop, Zooke= eper, and HBase. >> >> =3D=3D Documentation =3D=3D >> There is not currently documentation about Accumulo on the web, but a fa= ir amount of documentation and training materials exists and will be provid= ed on the Accumulo wiki at apache.org. =C2=A0Also, a paper discussing YCSB = results for Accumulo will be presented at the 2011 Symposium on Cloud Compu= ting. >> >> =3D=3D Initial Source =3D=3D >> Accumulo has been in development since spring 2008. =C2=A0There are hund= reds of developers using it and tens of developers have contributed to it. = =C2=A0The core codebase consists of 200,000 lines of code (mainly Java) and= 100s of pages of documentation. =C2=A0There are also a few projects built = on top of Accumulo that may be added to its contrib in the future. =C2=A0Th= ese include support for Hive, Matlab, YCSB, and graph processing. >> >> =3D=3D Source and Intellectual Property Submission Plan =3D=3D >> Accumulo core code, examples, documention, and training materials will b= e submitted by the National Security Agency. >> >> We will also be soliciting contributions of further plugins from MIT Lin= coln Labs, Carnegie Mellon University, and others. >> >> Accumulo has been developed by a mix of government employees and private= companies under government contract. =C2=A0Material developed by governmen= t employees is in the public domain and no U.S. copyright exists in works o= f the federal government. =C2=A0For the contractor developed material in th= e initial submission, the U.S. Government has sufficient authority per the = ICLA from the copyright owner to contribute the Accumulo code to the incuba= tor. >> >> There has been some discussion regarding accepting contributions from US= Government sources on [https://issues.apache.org/jira/browse/LEGAL-93 LEGA= L-93]. We propose that the NSA will sign an ICLA/CCLA if that document coul= d be slightly modified to explicitly address copyright in works of governme= nt employees. Specifically, we propose that the definition of =E2=80=9CYou= =E2=80=9D be modified to include =E2=80=9Cthe copyright owner, the owner of= a Contribution not subject to copyright, or legal entity authorized by the= copyright owner that is making this Agreement.=E2=80=9D In addition, secti= on 2, the copyright license grant be modified after =E2=80=9CYou hereby gra= nt=E2=80=9D that either states =E2=80=9Cto the extent authorized by law=E2= =80=9D or =E2=80=9Cto the extent copyright exists in the Contribution.=E2= =80=9D =C2=A0These changes will permit US Government employee developed wor= k to be included. >> >> One proposed solution is to form a Collaborative Research and Developmen= t Agreement (CRADA) between the Apache Software Foundation and the US Gover= nment, but this will not solve the underlying problem that U.S. law does no= t grant copyright to works of government employees. =C2=A0At this time a CR= ADA is not necessary but should it be determined that a CRADA is necessary,= we would like to work through that process during the incubation phase of = Accumulo rather than before acceptance as this may take time to enter into = an agreement. >> >> =3D=3D External Dependencies =3D=3D >> jetty (Apache and EPL), jline (BSD), jfreechart (LGPL), jcommon (LGPL), = slf4j (MIT), junit (CPL) >> >> =3D=3D Cryptography =3D=3D >> none >> >> =3D=3D Required Resources =3D=3D >> =C2=A0* Mailing Lists >> =C2=A0 * accumulo-private >> =C2=A0 * accumulo-dev >> =C2=A0 * accumulo-commits >> =C2=A0 * accumulo-user >> >> =C2=A0* Subversion Directory >> =C2=A0 * https://svn.apache.org/repos/asf/incubator/accumulo >> >> =C2=A0* Issue Tracking >> =C2=A0 * JIRA Accumulo (ACCUMULO) >> >> =C2=A0* Continuous Integration >> =C2=A0 * Jenkins builds on https://builds.apache.org/ >> >> =C2=A0* Web >> =C2=A0 * http://incubator.apache.org/accumulo/ >> =C2=A0 * wiki at http://wiki.apache.org or http://cwiki.apache.org >> >> =3D=3D Initial Committers =3D=3D >> =C2=A0* Aaron Cordova (aaron at cordovas dot org) >> =C2=A0* Adam Fuchs (adam.p.fuchs at ugov dot gov) >> =C2=A0* Eric Newton (ecn at swcomplete dot com) >> =C2=A0* Billie Rinaldi (billie.j.rinaldi at ugov dot gov) >> =C2=A0* Keith Turner (keith.turner at ptech-llc dot com) >> =C2=A0* John Vines (john.w.vines at ugov dot gov) >> =C2=A0* Chris Waring (christopher.a.waring at ugov dot gov) >> >> =3D=3D Affiliations =3D=3D >> =C2=A0* Aaron Cordova, The Interllective >> =C2=A0* Adam Fuchs, National Security Agency >> =C2=A0* Eric Newton, SW Complete Incorporated >> =C2=A0* Billie Rinaldi, National Security Agency >> =C2=A0* Keith Turner, Peterson Technology LLC >> =C2=A0* John Vines, National Security Agency >> =C2=A0* Chris Waring, National Security Agency >> >> =3D=3D Sponsors =3D=3D >> =C2=A0* Champion: Doug Cutting >> =C2=A0* Nominated Mentors: Benson Margulies, ?, ? >> =C2=A0* Sponsoring Entity: Apache Incubator >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org >> For additional commands, e-mail: general-help@incubator.apache.org >> >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org