From general-return-31357-apmail-incubator-general-archive=incubator.apache.org@incubator.apache.org Fri Sep 9 16:41:59 2011 Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 188D486F8 for ; Fri, 9 Sep 2011 16:41:59 +0000 (UTC) Received: (qmail 30887 invoked by uid 500); 9 Sep 2011 16:41:58 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 30676 invoked by uid 500); 9 Sep 2011 16:41:57 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 30668 invoked by uid 99); 9 Sep 2011 16:41:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Sep 2011 16:41:57 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nour.mohammad@gmail.com designates 209.85.213.175 as permitted sender) Received: from [209.85.213.175] (HELO mail-yx0-f175.google.com) (209.85.213.175) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Sep 2011 16:41:52 +0000 Received: by yxj17 with SMTP id 17so1703675yxj.6 for ; Fri, 09 Sep 2011 09:41:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=QC1fIWQo7E8StK1OxC6SR0U/vEj5Uii7uNS0JQsEXLc=; b=pVt5hYpjxWKs2F9KXRu9oOsNQKrXhoFKcKlrvUrZVsfeMKQNhk1X+ZXadZewUIhWAh lgfaIOv1EXYXwrsW5z2B3k7wx5a1SvTSBNrj5cr3dPFlqGqcNmGoR+gyOZW4nQPzmlXW X7JGptSgWvDn1DO+zrPQ9x8EeEQDpVMrFqGaM= MIME-Version: 1.0 Received: by 10.236.139.169 with SMTP id c29mr13313548yhj.122.1315586491377; Fri, 09 Sep 2011 09:41:31 -0700 (PDT) Received: by 10.236.60.137 with HTTP; Fri, 9 Sep 2011 09:41:25 -0700 (PDT) In-Reply-To: References: <4E6A3D5B.10508@apache.org> Date: Fri, 9 Sep 2011 17:41:25 +0100 Message-ID: Subject: Re: [VOTE] Accumulo to join the Incubator From: Mohammad Nour El-Din To: general@incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable +1 (binding) On Fri, Sep 9, 2011 at 5:33 PM, wrote: > +1 ! > > - milind > > On 9/9/11 9:22 AM, "Doug Cutting" wrote: > >>It's been a week since the Accumulo proposal was submitted for >>discussion. =A0A few questions were asked, and the proposal was clarified >>in response. =A0Sufficient mentors have volunteered. =A0I thus feel we ar= e >>now ready for a vote. >> >>The latest proposal can be found at the end of this email and at: >> >> =A0http://wiki.apache.org/incubator/AccumuloProposal >> >>The discussion regarding the proposal can be found at: >> >> =A0http://s.apache.org/oi >> >>Please cast your votes: >> >>[ =A0] +1 Accept Accumulo for incubation >>[ =A0] +0 Indifferent to Accumulo incubation >>[ =A0] -1 Reject Accumulo for incubation >> >>This vote will close 72 hours from now. >> >>Thanks, >> >>Doug >> >>----------------------- >> >>=3D Accumulo Proposal =3D >> >>=3D=3D Abstract =3D=3D >>Accumulo is a distributed key/value store that provides expressive, >>cell-level access labels. >> >>=3D=3D Proposal =3D=3D >>Accumulo is a sorted, distributed key/value store based on Google's >>BigTable design. =A0It is built on top of Apache Hadoop, Zookeeper, and >>Thrift. =A0It features a few novel improvements on the BigTable design in >>the form of cell-level access labels and a server-side programming >>mechanism that can modify key/value pairs at various points in the data >>management process. >> >>=3D=3D Background =3D=3D >>Google published the design of BigTable in 2006. =A0Several other open >>source projects have implemented aspects of this design including HBase, >>CloudStore, and Cassandra. =A0Accumulo began its development in 2008. >> >>=3D=3D Rationale =3D=3D >>There is a need for a flexible, high performance distributed key/value >>store that provides expressive, fine-grained access labels. =A0The >>communities we expect to be most interested in such a project are >>government, health care, and other industries where privacy is a >>concern. =A0We have made much progress in developing this project over th= e >>past 3 years and believe both the project and the interested communities >>would benefit from this work being openly available and having open >>development. >> >>=3D=3D Current Status =3D=3D >> >>=3D=3D=3D Meritocracy =3D=3D=3D >>We intend to strongly encourage the community to help with and >>contribute to the code. =A0We will actively seek potential committers and >>help them become familiar with the codebase. >> >>=3D=3D=3D Community =3D=3D=3D >>A strong government community has developed around Accumulo and training >>classes have been ongoing for about a year. =A0Hundreds of developers use >>Accumulo. >> >>=3D=3D=3D Core Developers =3D=3D=3D >>The developers are mainly employed by the National Security Agency, but >>we anticipate interest developing among other companies. >> >>=3D=3D=3D Alignment =3D=3D=3D >>Accumulo is built on top of Hadoop, Zookeeper, and Thrift. =A0It builds >>with Maven. =A0Due to the strong relationship with these Apache projects, >>the incubator is a good match for Accumulo. >> >>=3D=3D Known Risks =3D=3D >>=3D=3D=3D Orphaned Products =3D=3D=3D >>There is only a small risk of being orphaned. =A0The community is >>committed to improving the codebase of the project due to its fulfilling >>needs not addressed by any other software. >> >>=3D=3D=3D Inexperience with Open Source =3D=3D=3D >>The codebase has been treated internally as an open source project since >>its beginning, and the initial Apache committers have been involved with >>the code for multiple years. =A0While our experience with public open >>source is limited, we do not anticipate difficulty in operating under >>Apache's development process. >> >>=3D=3D=3D Homogeneous Developers =3D=3D=3D >>The committers have multiple employers and it is expected that >>committers from different companies will be recruited. >> >>=3D=3D=3D Reliance on Salaried Developers =3D=3D=3D >>The initial committers are all paid by their employers to work on >>Accumulo and we expect such employment to continue. =A0Some of the initia= l >>committers would continue as volunteers even if no longer employed to do >>so. >> >>=3D=3D=3D Relationships with Other Apache Products =3D=3D=3D >>Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang, >>-net, -io, -jci, -collections, -configuration, -logging, and -codec. >> >>=3D=3D=3D Relationship to HBase =3D=3D=3D >>Accumulo and HBase are both based on the design of Google's BigTable, so >>there is a danger that potential users will have difficulty >>distinguishing the two. =A0Some of the key areas in which Accumulo differ= s >>from HBase are discussed below. =A0It may be possible to incorporate the >>desired features of Accumulo into HBase. =A0However, the amount of work >>required would slow development of HBase and Accumulo considerably. =A0We >>believe this warrants a podling for Accumulo at the current time. =A0We >>expect active cross-pollination will occur between HBase and podling >>Accumulo and it is possible that the codebases and projects will >>ultimately converge. >> >>=3D=3D=3D=3D Access Labels =3D=3D=3D=3D >>Accumulo has an additional portion of its key that sorts after the >>column qualifier and before the timestamp. =A0It is called column >>visibility and enables expressive cell-level access control. >>Authorizations are passed with each query to control what data is >>returned to the user. =A0The column visibilities are boolean AND and OR >>combinations of arbitrary strings (such as "(A&B)|C") and authorizations >>are sets of strings (such as {C,D}). >> >>=3D=3D=3D=3D Iterators =3D=3D=3D=3D >>Accumulo has a novel server-side programming mechanism that can modify >>the data written to disk or returned to the user. =A0This mechanism can b= e >>configured for any of the scopes where data is read from or written to >>disk. =A0It can be used to perform joins on data within a single tablet. >> >>=3D=3D=3D=3D Flexibility =3D=3D=3D=3D >>HBase requires the user to specify the set of column families to be used >>up front. =A0Accumulo places no restrictions on the column families. >>Also, each column family in HBase is stored separately on disk. >>Accumulo allows column families to be grouped together on disk, as does >>BigTable. =A0This enables users to configure how their data is stored, >>potentially providing improvements in compression and lookup speeds. =A0I= t >>gives Accumulo a row/column hybrid nature, while HBase is currently >>column-oriented. >> >>=3D=3D=3D=3D Testing =3D=3D=3D=3D >>Accumulo has testing frameworks that have resulted in its achieving a >>high level of correctness and performance. =A0We have observed that under >>some configurations and conditions Accumulo will outperform HBase and >>provide greater data integrity. >> >>=3D=3D=3D=3D Logging =3D=3D=3D=3D >>HBase uses a write-ahead log on the Hadoop Distributed File System. >>Accumulo has its own logging service that does not depend on >>communication with the HDFS NameNode. >> >>=3D=3D=3D=3D Storage =3D=3D=3D=3D >>Accumulo has a relative key file format that improves compression. >> >>=3D=3D=3D=3D Areas in which HBase features improvements over Accumulo =3D= =3D=3D=3D >>in memory tables, upserts, coprocessors, connections to other projects >>such as Cascading and Pig >> >>=3D=3D=3D Expectations =3D=3D=3D >>There is a risk that Accumulo will be criticized for not providing >>adequate security. =A0The access labels in Accumulo do not in themselves >>provide a complete security solution, but are a mechanism for labeling >>each piece of data with the authorizations that are necessary to see it. >> >>=3D=3D=3D Apache Brand =3D=3D=3D >>Our interest in releasing this code as an Apache incubator project is >>due to its strong relationship with other Apache projects, i.e. Accumulo >>has dependencies on Hadoop, Zookeeper, and Thrift and has complementary >>goals to HBase. >> >>=3D=3D Documentation =3D=3D >>There is not currently documentation about Accumulo on the web, but a >>fair amount of documentation and training materials exists and will be >>provided on the Accumulo wiki at apache.org. =A0Also, a paper discussing >>YCSB results for Accumulo will be presented at the 2011 Symposium on >>Cloud Computing. >> >>=3D=3D Initial Source =3D=3D >>Accumulo has been in development since spring 2008. =A0There are hundreds >>of developers using it and tens of developers have contributed to it. >>The core codebase consists of 200,000 lines of code (mainly Java) and >>100s of pages of documentation. =A0There are also a few projects built on >>top of Accumulo that may be added to its contrib in the future. =A0These >>include support for Hive, Matlab, YCSB, and graph processing. >> >>=3D=3D Source and Intellectual Property Submission Plan =3D=3D >>Accumulo core code, examples, documention, and training materials will >>be submitted by the National Security Agency. >> >>We will also be soliciting contributions of further plugins from MIT >>Lincoln Labs, Carnegie Mellon University, and others. >> >>Accumulo has been developed by a mix of government employees and private >>companies under government contract. =A0Material developed by government >>employees is in the public domain and no U.S. copyright exists in works >>of the federal government. =A0For the contractor developed material in th= e >>initial submission, the U.S. Government has sufficient authority per the >>ICLA from the copyright owner to contribute the Accumulo code to the >>incubator. >> >>There has been some discussion regarding accepting contributions from US >>Government sources on https://issues.apache.org/jira/browse/LEGAL-93. We >>propose that the NSA will sign an ICLA/CCLA if that document could be >>slightly modified to explicitly address copyright in works of government >>employees. Specifically, we propose that the definition of =B3You=B2 be >>modified to include =B3the copyright owner, the owner of a Contribution >>not subject to copyright, or legal entity authorized by the copyright >>owner that is making this Agreement.=B2 In addition, section 2, the >>copyright license grant be modified after =B3You hereby grant=B2 that eit= her >>states =B3to the extent authorized by law=B2 or =B3to the extent copyrigh= t >>exists in the Contribution.=B2 =A0These changes will permit US Government >>employee developed work to be included. >> >>One proposed solution is to form a Collaborative Research and >>Development Agreement (CRADA) between the Apache Software Foundation and >>the US Government, but this will not solve the underlying problem that >>U.S. law does not grant copyright to works of government employees. =A0At >>this time a CRADA is not necessary but should it be determined that a >>CRADA is necessary, we would like to work through that process during >>the incubation phase of Accumulo rather than before acceptance as this >>may take time to enter into an agreement. >> >>=3D=3D External Dependencies =3D=3D >>jetty (Apache and EPL), jline (BSD), jfreechart (LGPL), jcommon (LGPL), >>slf4j (MIT), junit (CPL) >> >>=3D=3D Cryptography =3D=3D >>none >> >>=3D=3D Required Resources =3D=3D >> * Mailing Lists >> =A0 * accumulo-private >> =A0 * accumulo-dev >> =A0 * accumulo-commits >> =A0 * accumulo-user >> >> * Subversion Directory >> =A0 * https://svn.apache.org/repos/asf/incubator/accumulo >> >> * Issue Tracking >> =A0 * JIRA Accumulo (ACCUMULO) >> >> * Continuous Integration >> =A0 * Jenkins builds on https://builds.apache.org/ >> >> * Web >> =A0 * http://incubator.apache.org/accumulo/ >> =A0 * wiki at http://wiki.apache.org or http://cwiki.apache.org >> >>=3D=3D Initial Committers =3D=3D >> * Aaron Cordova (aaron at cordovas dot org) >> * Adam Fuchs (adam.p.fuchs at ugov dot gov) >> * Eric Newton (ecn at swcomplete dot com) >> * Billie Rinaldi (billie.j.rinaldi at ugov dot gov) >> * Keith Turner (keith.turner at ptech-llc dot com) >> * John Vines (john.w.vines at ugov dot gov) >> * Chris Waring (christopher.a.waring at ugov dot gov) >> >>=3D=3D Affiliations =3D=3D >> * Aaron Cordova, The Interllective >> * Adam Fuchs, National Security Agency >> * Eric Newton, SW Complete Incorporated >> * Billie Rinaldi, National Security Agency >> * Keith Turner, Peterson Technology LLC >> * John Vines, National Security Agency >> * Chris Waring, National Security Agency >> >>=3D=3D Sponsors =3D=3D >> * Champion: Doug Cutting >> >>=3D=3D Nominated Mentors =3D=3D >> * Benson Margulies >> * Alan Cabrera >> * Bernd Fondermann >> * Owen O'Malley >> >>=3D=3D Sponsoring Entity =3D=3D >> * Apache Incubator >> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org >>For additional commands, e-mail: general-help@incubator.apache.org >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --=20 Thanks - Mohammad Nour ---- "Life is like riding a bicycle. To keep your balance you must keep moving" - Albert Einstein --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org