Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B09C01159A for ; Fri, 2 May 2014 18:19:57 +0000 (UTC) Received: (qmail 14282 invoked by uid 500); 2 May 2014 18:19:35 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 14077 invoked by uid 500); 2 May 2014 18:19:34 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 14066 invoked by uid 99); 2 May 2014 18:19:34 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 May 2014 18:19:34 +0000 Received: from localhost (HELO mail-wg0-f42.google.com) (127.0.0.1) (smtp-auth username apurtell, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 May 2014 18:19:33 +0000 Received: by mail-wg0-f42.google.com with SMTP id k14so3912903wgh.25 for ; Fri, 02 May 2014 11:19:32 -0700 (PDT) X-Received: by 10.180.92.34 with SMTP id cj2mr4159080wib.15.1399054772129; Fri, 02 May 2014 11:19:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.226.17 with HTTP; Fri, 2 May 2014 11:18:52 -0700 (PDT) In-Reply-To: References: <30B3CF4EC62B4D46BDA4DD14D8396F08929D5A87@ex01> From: Andrew Purtell Date: Fri, 2 May 2014 11:18:52 -0700 Message-ID: Subject: Re: [PROPOSAL] Optiq To: "general@incubator.apache.org" Content-Type: multipart/alternative; boundary=f46d043892b5b73b3c04f86ed272 --f46d043892b5b73b3c04f86ed272 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable All that I suggest is that candidate Apache projects articulate how they differ from related projects, and that we consider the strength of this argument when evaluating the long term viability of the effort and community. It would be good if proposals have a "related work" section done with the diligence and detail as the typical academic publication, I haven't seen that at least recently. Differences in project direction leading to new projects (effectively, sanctioned forks) is fine, although regrettable, since that would represent an acknowledged failure of the Apache community process. "Creative competition" between differing abstractions is fine. Etc. But if I come to Apache to set up Apache Foo, with presumably the focus and care on community development a motivating factor for that (otherwise why shouldn't I just go to GitHub?), then if later the Incubator admits Apache FooBar (incubating) and Apache FooBaz (incubating) that significantly overlap and duplicate my efforts - overriding my concerns or objections - then I'd be inclined to not view Apache as a particularly good steward of my community development. The devil is in the details, which takes me back to the point made in the above paragraph. On Fri, May 2, 2014 at 10:52 AM, Chris Douglas wrote: > On Thu, May 1, 2014 at 2:46 PM, Andrew Purtell > wrote: > > If not part of the initial proposal, then > > at least making a good case as a criteria for graduation, and writing u= p > > related work and how the new project differentiates could be an initial > > task done on JIRA after acceptance along the lines of the trademark > search. > > I see this differently. Project overlap (particularly in the > incubator) is neither surprising nor regrettable. Recently we've seen > several SQL, streaming, and security projects. While these are all > mature domains, the "best practices" are still being explored. Each > branch in architecture may accommodate a new project, and each path > through those tradeoffs will define those communities. They'll also > define each other; by way of illustration, a project that's a subset > of another becomes the "lightweight" implementation. If the enthusiasm > for a project wanes, that's not a tragedy the incubator can prevent by > forcing alignment based on the goal of the project. Rejecting a > community will not cause them to join an existing one; they'll just > leave Apache. > > More than losing an opportunity to foster a community, a policy > favoring consolidation would actively harm innovation and > experimentation. A requirement for uniqueness would reward first > movers and leave no outlet for legitimate differences in project > direction. Granting existing projects authority over prospective > communities _because_ they compete is not an optimization. As we saw > with HCatalog, sometimes revolutions don't become distinct communities > and the effort is reabsorbed. The incubator should continue to support > that natural process. > > Finally, it's not surprising that the incubator will see projects with > similar goals in waves. The need for new abstractions is experienced > jointly and solutions are explored concurrently. That's a feature of > the incubator, not a bug. > > Articulating the project's "related work" is a useful exercise, which > is why it's a section in the proposal. -C > > > On Thu, May 1, 2014 at 2:22 PM, Henry Saputra >wrote: > > > >> Unfortunately, similar projects entering Apache incubator are common > >> things =3D( > >> > >> Even though each original project proposers can argue about > >> differences in one way or another, it will eventually decided by > >> adoption and community growth, and at the end the quality of the > >> project itself. > >> > >> Some other incoming projects had been in similar questions/concerns > >> regarding "competing" with existing ASF projects, e.g.: Twill vs > >> Slider, Samza vs Storm vs S4, and several others. > >> > >> > >> - Henry > >> > >> On Thu, May 1, 2014 at 12:14 AM, Ted Dunning > >> wrote: > >> > I think that there is a huge difference between Metamodel and Optiq. > >> > > >> > In particular: > >> > > >> > - Optiq provides real SQL including nested queries, correlated > >> sub-queries > >> > and so on > >> > > >> > - Metamodel uses a fluent Java API ... SQL parsing and transformatio= n > >> > doesn't appear to be a goal > >> > > >> > - Optiq provides highly advanced query transformations including > >> > decorrelations based on estimated execution costs. > >> > > >> > - Metamodel appears to provide no significant query transformations > >> > > >> > - Optiq only provides query execution as a by-product for testing > >> > > >> > - Metamodel has query execution as a central goal > >> > > >> > - Optiq provides a form of type inferencing for SQL queries. This i= s > >> > unique to Optiq as far as I know. > >> > > >> > > >> > > >> > On Thu, May 1, 2014 at 8:57 AM, Kasper S=C3=B8rensen < > >> > kasper.sorensen@humaninference.com> wrote: > >> > > >> >> I see a lot of conceptual similarity between Optiq and the Apache > >> >> MetaModel (incubator) project [1]. Maybe something can be done to > align > >> the > >> >> two projects, so that we avoid having two incubating projects that = do > >> >> basically the same thing? > >> >> > >> >> Or maybe there's some glaring difference that I am missing? At leas= t > it > >> >> seems to me both to be projects that try to provide uniform queryin= g > >> >> capabilities to a wide array of data backends. Both project also > favor a > >> >> type-safe Java querying API instead of a String/SQL oriented query > API. > >> >> > >> >> Regards, > >> >> Kasper S=C3=B8rensen > >> >> > >> >> [1] http://metamodel.incubator.apache.org/ > >> >> > >> >> ________________________________________ > >> >> From: Ashutosh Chauhan [hashutosh@apache.org] > >> >> Sent: 01 May 2014 00:21 > >> >> To: general@incubator.apache.org > >> >> Subject: [PROPOSAL] Optiq > >> >> > >> >> I would like to propose Optiq as an Apache Incubator project. I ha= ve > >> >> posted the proposal to > https://wiki.apache.org/incubator/OptiqProposaland > >> >> posted the text of the proposal below. > >> >> > >> >> Ashutosh. > >> >> > >> >> =3D Optiq =3D > >> >> =3D=3D Abstract =3D=3D > >> >> > >> >> Optiq is a framework that allows efficient translation of queries > >> involving > >> >> heterogeneous and federated data. > >> >> > >> >> =3D=3D Proposal =3D=3D > >> >> > >> >> Optiq is a highly customizable engine for parsing and planning > queries > >> on > >> >> data in a wide variety of formats. It allows database-like access, > and > >> in > >> >> particular a SQL interface and advanced query optimization, for dat= a > not > >> >> residing in a traditional database. > >> >> > >> >> =3D=3D Background =3D=3D > >> >> > >> >> Databases were traditionally engineered in a monolithic stack, > >> providing a > >> >> data storage format, data processing algorithms, query parser, quer= y > >> >> planner, built-in functions, metadata repository and connectivity > layer. > >> >> They innovate in some areas but rarely in all. > >> >> > >> >> Modern data management systems are decomposing that stack into > separate > >> >> components, separating data, processing engine, metadata, and query > >> >> language support. They are highly heterogeneous, with data in > multiple > >> >> locations and formats, caching and redundant data, different > workloads, > >> and > >> >> processing occurring in different engines. > >> >> > >> >> Query planning (sometimes called query optimization) has always bee= n > a > >> key > >> >> function of a DBMS, because it allows the implementors to introduce > new > >> >> query-processing algorithms, and allows data administrators to > >> re-organize > >> >> the data without affecting applications built on that data. In a > >> >> componentized system, the query planner integrates the components > (data > >> >> formats, engines, algorithms) without introducing unncessary > coupling or > >> >> performance tradeoffs. > >> >> > >> >> But building a query planner is hard; many systems muddle along > without > >> a > >> >> planner, and indeed a SQL interface, until the demand from their > >> customers > >> >> is overwhelming. > >> >> > >> >> There is an opportunity to make this process more efficient by > creating > >> a > >> >> re-usable framework. > >> >> > >> >> =3D=3D Rationale =3D=3D > >> >> > >> >> Optiq allows database-like access, and in particular a SQL interfac= e > and > >> >> advanced query optimization, for data not residing in a traditional > >> >> database. It is complementary to many current Hadoop and NoSQL > systems, > >> >> which have innovative and performant storage and runtime systems bu= t > >> lack a > >> >> SQL interface and intelligent query translation. > >> >> > >> >> Optiq is already in use by several projects, including Apache Drill= , > >> Apache > >> >> Hive and Cascading Lingual, and commercial products. > >> >> > >> >> Optiq's architecture consists of: > >> >> > >> >> An extensible relational algebra. > >> >> SPIs (service-provider interfaces) for metadata (schemas and tables= ), > >> >> planner rules, statistics, cost-estimates, user-defined functions. > >> >> Built-in sets of rules for logical transformations and common > >> data-sources. > >> >> Two query planning engines driven by rules, statistics, etc. One > engine > >> is > >> >> cost-based, the other rule-based. > >> >> Optional SQL parser, validator and translator to relational algebra= . > >> >> Optional JDBC driver. > >> >> =3D=3D Initial Goals =3D=3D > >> >> > >> >> The initial goals are be to move the existing codebase to Apache an= d > >> >> integrate with the Apache development process. Once this is > >> accomplished, > >> >> we plan for incremental development and releases that follow the > Apache > >> >> guidelines. > >> >> > >> >> As we move the code into the org.apache namespace, we will > restructure > >> >> components as necessary to allow clients to use just the components > of > >> >> Optiq that they need. > >> >> > >> >> A version 1.0 release, including pre-built binaries, will foster > wider > >> >> adoption. > >> >> > >> >> =3D=3D Current Status =3D=3D > >> >> > >> >> Optiq has had over a dozen minor releases over the last 18 months. > Its > >> core > >> >> SQL parser and validator, and its planning engine and core rules, a= re > >> >> mature and robust and are the basis for several production systems; > but > >> >> other components and SPIs are still undergoing rapid evolution. > >> >> > >> >> =3D=3D=3D Meritocracy =3D=3D=3D > >> >> > >> >> We plan to invest in supporting a meritocracy. We will discuss the > >> >> requirements in an open forum. We encourage the companies and > projects > >> >> using Optiq to discuss their requirements in an open forum and to > >> >> participate in development. We will encourage and monitor community > >> >> participation so that privileges can be extended to those that > >> contribute. > >> >> > >> >> Optiq's pluggable architecture encourages developers to contribute > >> >> extensions such as adapters for data sources, new planning rules, a= nd > >> >> better statistics and cost-estimation functions. We look forward to > >> >> fostering a rich ecosystem of extensions. > >> >> > >> >> =3D=3D=3D Community =3D=3D=3D > >> >> > >> >> Building a data management system requires a high degree of technic= al > >> >> skill, and correspondingly, the community of developers directly > using > >> >> Optiq is potentially fairly small, albeit highly technical and > engaged. > >> But > >> >> we also expect engagement from members of the communities of projec= ts > >> that > >> >> use Optiq, such as Drill and Hive. And we intend to structure Optiq > so > >> that > >> >> it can be used for lighter weight applications, such as providing a > SQL > >> and > >> >> JDBC interface to a NoSQL system. > >> >> > >> >> =3D=3D=3D Core Developers =3D=3D=3D > >> >> > >> >> The developers on the initial committers list are all experienced > open > >> >> source developers, and are actively using Optiq in their projects. > >> >> > >> >> * Julian Hyde is lead developer of Mondrian, an open source OLAP > >> engine, > >> >> and an Apache Drill committer. > >> >> * Chris Wensel is lead developer of Cascading, and of Lingual, the > SQL > >> >> interface to Cascading built using Optiq. > >> >> * Jacques Nadeau is lead developer of Apache Drill, which uses > Optiq. > >> >> > >> >> In addition, there are several regular contributors whom we hope wi= ll > >> >> graduate to committers during the incubation process. > >> >> > >> >> We realize that additional employer diversity is needed, and we wil= l > >> work > >> >> aggressively to recruit developers from additional companies. > >> >> > >> >> =3D=3D=3D Alignment =3D=3D=3D > >> >> > >> >> Apache, and in particular the ecosystem surrounding Hadoop, contain= s > >> >> several projects for building data management systems that leverage > each > >> >> other's capabilities. Optiq is a natural fit for that ecosystem, an= d > >> will > >> >> help foster projects meeting new challenges. > >> >> > >> >> Optiq is already used by Apache Hive and Apache Drill; Optiq embeds > >> Apache > >> >> Spark as an optional engine; we are in discussion with Apache Phoen= ix > >> about > >> >> integrating JDBC and query planning. > >> >> > >> >> =3D=3D Known Risks =3D=3D > >> >> > >> >> =3D=3D=3D Orphaned Products =3D=3D=3D > >> >> > >> >> Optiq is already a key component in three independent projects, eac= h > >> backed > >> >> by a different company, so the risk of being orphaned is relatively > >> low. We > >> >> plan to mitigate this risk by recruiting additional committers, and > >> >> promoting Optiq's adoption as a framework by other projects. > >> >> > >> >> =3D=3D=3D Inexperience with Open Source =3D=3D=3D > >> >> > >> >> The initial committers are all Apache members, some of whom have > several > >> >> years in the Apache Hadoop community. The founder of the project, > Julian > >> >> Hyde, has been a founder and key developer in open source projects > for > >> over > >> >> ten years. > >> >> > >> >> =3D=3D=3D Homogenous Developers =3D=3D=3D > >> >> > >> >> The initial committers are employed by a number of companies, > including > >> >> Concurrent, Hortonworks, MapR Technologies and Salesforce.com. We a= re > >> >> committed to recruiting additional committers from outside these > >> companies. > >> >> > >> >> =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > >> >> > >> >> Like most open source projects, Optiq receives substantial support > from > >> >> salaried developers. This is to be expected given that it is a high= ly > >> >> technical framework. However, they are all passionate about the > project, > >> >> and we are confident that the project will continue even if no > salaried > >> >> developers contribute to the project. As a framework, the project > >> >> encourages the involvement of members of other projects, and of > academic > >> >> researchers. We are committed to recruiting additional committers > >> including > >> >> non-salaried developers. > >> >> > >> >> =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > >> >> > >> >> As mentioned in the Alignment section, Optiq is being used by Apach= e > >> Hive > >> >> and Apache Drill, and has adapters for Apache Phoenix and Apache > Spark. > >> >> Optiq often operates on data in a Hadoop environment, so > collaboration > >> with > >> >> other Hadoop projects is desirable and highly likely. > >> >> > >> >> =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > >> >> > >> >> Optiq solves a real problem, as evidenced by its take-up by other > >> projects. > >> >> This proposal is not for the purpose of generating publicity. Rathe= r, > >> the > >> >> primary benefits to joining Apache are those outlined in the > Rationale > >> >> section. > >> >> > >> >> =3D=3D Documentation =3D=3D > >> >> > >> >> Additional documentation for Optiq may be found on its github site: > >> >> > >> >> * [[ > https://github.com/julianhyde/optiq/blob/master/README.md|Overview > >> ]] > >> >> * [[ > >> >> > >> > https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tutorial]= ] > >> >> * [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO > ]] > >> >> * [[ > >> >> > >> > https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Referencegui= de > >> >> ]] > >> >> > >> >> =3D=3D=3D Presentation: =3D=3D=3D > >> >> > >> >> *[[ > >> >> > >> >> > >> > https://github.com/julianhyde/share/blob/master/slides/optiq-richrelevanc= e-2013.pdf?raw=3Dtrue| > >> >> SQL on Big Data using Optiq]] > >> >> =3D=3D Initial Source =3D=3D > >> >> > >> >> The initial code codebase resides in three projects, all hosted on > >> github: > >> >> > >> >> * https://github.com/julianhyde/optiq > >> >> * https://github.com/julianhyde/optiq-csv > >> >> * https://github.com/julianhyde/linq4j > >> >> > >> >> =3D=3D=3D Source and Intellectual Property Submission Plan =3D=3D= =3D > >> >> > >> >> The initial codebase is already distributed under the Apache 2.0 > >> License. > >> >> The owners of the IP have indicated willingness to sign the SGA. > >> >> > >> >> =3D=3D=3D External Dependencies =3D=3D=3D > >> >> > >> >> Optiq and Linq4j have the following external dependencies. > >> >> > >> >> * Java 1.6, 1.7 or 1.8 > >> >> * Apache Maven, Commons > >> >> * JavaCC (BSD license) > >> >> * Sqlline 1.1.6 (BSD license) > >> >> * Junit 4.11 (EPL) > >> >> * Janino (BSD license) > >> >> * Guava (Apache 2.0 license) > >> >> * Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.= 0 > >> >> license) > >> >> > >> >> Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark, > >> >> optiq-splunk) are currently developed alongside core Optiq, and hav= e > the > >> >> following additional dependencies: > >> >> > >> >> * Open CSV 2.3 (Apache 2.0 license) > >> >> * Apache Incubator Spark > >> >> * Mongo Java driver (Apache 2.0 license) > >> >> Upon acceptance to the incubator, we would begin a thorough analysi= s > of > >> all > >> >> transitive dependencies to verify this information and introduce > license > >> >> checking into the build and release process by integrating with > Apache > >> Rat. > >> >> > >> >> =3D=3D=3D Cryptography =3D=3D=3D > >> >> > >> >> Optiq will eventually support encryption on the wire. This is not > one of > >> >> the initial goals, and we do not expect Optiq to be a controlled > export > >> >> item due to the use of encryption. > >> >> > >> >> =3D=3D Required Resources =3D=3D > >> >> > >> >> =3D=3D=3D Mailing Lists =3D=3D=3D > >> >> > >> >> * private@optiq.incubator.apache.org > >> >> * dev@optiq.incubator.apache.org (will be migrated from > >> >> optiq-dev@googlegroups.com) > >> >> * commits@optiq.incubator.apache.org > >> >> > >> >> =3D=3D=3D Source control =3D=3D=3D > >> >> > >> >> The Optiq team would like to use git for source control, due to our > >> current > >> >> use of git/github. We request a writeable git repo git:// > >> >> git.apache.org/incubator-optiq, and mirroring to be set up to githu= b > >> >> through INFRA. > >> >> > >> >> =3D=3D=3D Issue Tracking =3D=3D=3D > >> >> > >> >> Optiq currently uses the github issue tracking system associated wi= th > >> its > >> >> github repo: https://github.com/julianhyde/optiq/issues. We will > >> migrate > >> >> to > >> >> the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ. > >> >> > >> >> =3D=3D Initial Committers =3D=3D > >> >> > >> >> * Julian Hyde (jhyde at apache dot org) > >> >> * Jacques Nadeau (jacques at apache dot org) > >> >> * James R. Taylor (jamestaylor at apache dot org) > >> >> * Chris Wensel (cwensel at apache dot org) > >> >> > >> >> =3D=3D=3D Affiliations =3D=3D=3D > >> >> > >> >> The initial committers are employees of Concurrent, Hortonworks, Ma= pR > >> and > >> >> Salesforce.com. > >> >> > >> >> * Julian Hyde (Hortonworks) > >> >> * Jacques Nadeau (MapR Technologies) > >> >> * James R. Taylor (Salesforce.com) > >> >> * Chris Wensel (Concurrent) > >> >> > >> >> =3D=3D Sponsors =3D=3D > >> >> > >> >> =3D=3D=3D Champion =3D=3D=3D > >> >> > >> >> * Ashutosh Chauhan (hashutosh at apache dot org) > >> >> > >> >> =3D=3D=3D Nominated Mentors =3D=3D=3D > >> >> > >> >> * Ted Dunning (tdunning at apache dot org) - Chief Application > >> Architect > >> >> at MapR Technologies; committer for Lucene, Mahout and ZooKeeper. > >> >> * Alan Gates (gates at apache dot org) - Architect at Hortonworks; > >> >> committer for Pig, Hive and others. > >> >> * Steven Noels (stevenn at apache dot org) - Chief Technical > Officer at > >> >> NGDATA; committer for Cocoon and Forrest, mentor for Phoenix. > >> >> > >> >> =3D=3D=3D Sponsoring Entity =3D=3D=3D > >> >> > >> >> The Apache Incubator. > >> >> > >> >> -------------------------------------------------------------------= -- > >> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > >> >> For additional commands, e-mail: general-help@incubator.apache.org > >> >> > >> >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > >> For additional commands, e-mail: general-help@incubator.apache.org > >> > >> > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hei= n > > (via Tom White) > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --=20 Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --f46d043892b5b73b3c04f86ed272--