Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 387671109F for ; Mon, 12 May 2014 19:25:37 +0000 (UTC) Received: (qmail 22245 invoked by uid 500); 12 May 2014 18:58:56 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 22057 invoked by uid 500); 12 May 2014 18:58:56 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 22031 invoked by uid 99); 12 May 2014 18:58:55 -0000 Received: from Unknown (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 18:58:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of fhueske@gmail.com designates 209.85.213.48 as permitted sender) Received: from [209.85.213.48] (HELO mail-yh0-f48.google.com) (209.85.213.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 May 2014 18:58:52 +0000 Received: by mail-yh0-f48.google.com with SMTP id a41so4421995yho.21 for ; Mon, 12 May 2014 11:58:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=3UoFDP13w4TF7FrfJj/yVsFm2zoUyeSVlBioBByw+QQ=; b=0NUxF2yRFLkhJ2agU/BNHZ7AGYLL0gUOrvgNxx+0efx+4sQ7LRs2ZTO5xXxZSvuxbr F5rxUJhS33NgkyiyPq1pjHXCEMYYEkSw+H/hUsLsqbn28LaTXIGm1191nfZdBLcNs3AN e7V3ttKfwbtK9WWHAShoWUW7ZHHhIVFA6vgZtoLodOUFOVGDX8SPKCc77neN3EObRniu 2vb5BkN+Mvd9L21FBQvJEO0w7jeux08w00hNbpPhMVAcyzMdNxMFv5V+4/OB9ar2rSNO l4yE8s7VOwZHcnGEw5CC4OuhzlMXeXEuByvxWdTDdN9GDin8mDZaCeD5VBfwazYFQdTL g8mA== MIME-Version: 1.0 X-Received: by 10.236.44.132 with SMTP id n4mr15520421yhb.161.1399921108520; Mon, 12 May 2014 11:58:28 -0700 (PDT) Received: by 10.170.52.195 with HTTP; Mon, 12 May 2014 11:58:28 -0700 (PDT) In-Reply-To: <3C9C2669-A155-4296-A13B-32188E69A97E@hortonworks.com> References: <3C9C2669-A155-4296-A13B-32188E69A97E@hortonworks.com> Date: Mon, 12 May 2014 20:58:28 +0200 Message-ID: Subject: Re: [VOTE] Accept Optiq into the incubator From: Fabian Hueske To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=bcaec554004063147104f9388852 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec554004063147104f9388852 Content-Type: text/plain; charset=UTF-8 +1 (non-binding) 2014-05-12 17:11 GMT+02:00 Alan Gates : > +1. > > Alan. > > On May 9, 2014, at 11:03 AM, Ashutosh Chauhan > wrote: > > > Based on the results of the discussion thread ( > > > http://mail-archives.apache.org/mod_mbox/incubator-general/201404.mbox/%3CCA%2BFBdFQA4TghLRdh9GgDKaMtKLQHxE_QZV%3DoZ7HfiDSA_jyqwg%40mail.gmail.com%3E > > ), I would like to call a vote on accepting Optiq into the incubator. > > > > [ ] +1 Accept Optiq into the Incubator > > [ ] +0 Indifferent to the acceptance of Stratosphere > > [ ] -1 Do not accept Optiq because ... > > > > The vote will be open until Tuesday May 13 18:00 UTC. > > > > https://wiki.apache.org/incubator/OptiqProposal > > > > = Optiq = > > == Abstract == > > > > Optiq is a framework that allows efficient translation of queries > involving > > heterogeneous and federated data. > > > > == Proposal == > > > > Optiq is a highly customizable engine for parsing and planning queries on > > data in a wide variety of formats. It allows database-like access, and in > > particular a SQL interface and advanced query optimization, for data not > > residing in a traditional database. > > > > == Background == > > > > Databases were traditionally engineered in a monolithic stack, providing > a > > data storage format, data processing algorithms, query parser, query > > planner, built-in functions, metadata repository and connectivity layer. > > They innovate in some areas but rarely in all. > > > > Modern data management systems are decomposing that stack into separate > > components, separating data, processing engine, metadata, and query > > language support. They are highly heterogeneous, with data in multiple > > locations and formats, caching and redundant data, different workloads, > and > > processing occurring in different engines. > > > > Query planning (sometimes called query optimization) has always been a > key > > function of a DBMS, because it allows the implementors to introduce new > > query-processing algorithms, and allows data administrators to > re-organize > > the data without affecting applications built on that data. In a > > componentized system, the query planner integrates the components (data > > formats, engines, algorithms) without introducing unncessary coupling or > > performance tradeoffs. > > > > But building a query planner is hard; many systems muddle along without a > > planner, and indeed a SQL interface, until the demand from their > customers > > is overwhelming. > > > > There is an opportunity to make this process more efficient by creating a > > re-usable framework. > > > > == Rationale == > > > > Optiq allows database-like access, and in particular a SQL interface and > > advanced query optimization, for data not residing in a traditional > > database. It is complementary to many current Hadoop and NoSQL systems, > > which have innovative and performant storage and runtime systems but > lack a > > SQL interface and intelligent query translation. > > > > Optiq is already in use by several projects, including Apache Drill, > Apache > > Hive and Cascading Lingual, and commercial products. > > > > Optiq's architecture consists of: > > > > An extensible relational algebra. > > * SPIs (service-provider interfaces) for metadata (schemas and tables), > > planner rules, statistics, cost-estimates, user-defined functions. > > * Built-in sets of rules for logical transformations and common > > data-sources. > > * Two query planning engines driven by rules, statistics, etc. One engine > > is cost-based, the other rule-based. > > * Optional SQL parser, validator and translator to relational algebra. > > * Optional JDBC driver. > > > > == Initial Goals == > > > > The initial goals are be to move the existing codebase to Apache and > > integrate with the Apache development process. Once this is accomplished, > > we plan for incremental development and releases that follow the Apache > > guidelines. > > > > As we move the code into the org.apache namespace, we will restructure > > components as necessary to allow clients to use just the components of > > Optiq that they need. > > > > A version 1.0 release, including pre-built binaries, will foster wider > > adoption. > > > > == Current Status == > > > > Optiq has had over a dozen minor releases over the last 18 months. Its > core > > SQL parser and validator, and its planning engine and core rules, are > > mature and robust and are the basis for several production systems; but > > other components and SPIs are still undergoing rapid evolution. > > > > === Meritocracy === > > > > We plan to invest in supporting a meritocracy. We will discuss the > > requirements in an open forum. We encourage the companies and projects > > using Optiq to discuss their requirements in an open forum and to > > participate in development. We will encourage and monitor community > > participation so that privileges can be extended to those that > contribute. > > > > Optiq's pluggable architecture encourages developers to contribute > > extensions such as adapters for data sources, new planning rules, and > > better statistics and cost-estimation functions. We look forward to > > fostering a rich ecosystem of extensions. > > > > === Community === > > > > Building a data management system requires a high degree of technical > > skill, and correspondingly, the community of developers directly using > > Optiq is potentially fairly small, albeit highly technical and engaged. > But > > we also expect engagement from members of the communities of projects > that > > use Optiq, such as Drill and Hive. And we intend to structure Optiq so > that > > it can be used for lighter weight applications, such as providing a SQL > and > > JDBC interface to a NoSQL system. > > > > === Core Developers === > > > > The developers on the initial committers list are all experienced open > > source developers, and are actively using Optiq in their projects. > > > > * Julian Hyde is lead developer of Mondrian, an open source OLAP engine, > > and an Apache Drill committer. > > * Chris Wensel is lead developer of Cascading, and of Lingual, the SQL > > interface to Cascading built using Optiq. > > * Jacques Nadeau is lead developer of Apache Drill, which uses Optiq. > > > > In addition, there are several regular contributors whom we hope will > > graduate to committers during the incubation process. > > > > We realize that additional employer diversity is needed, and we will work > > aggressively to recruit developers from additional companies. > > > > === Alignment === > > > > Apache, and in particular the ecosystem surrounding Hadoop, contains > > several projects for building data management systems that leverage each > > other's capabilities. Optiq is a natural fit for that ecosystem, and will > > help foster projects meeting new challenges. > > > > Optiq is already used by Apache Hive and Apache Drill; Optiq embeds > Apache > > Spark as an optional engine; we are in discussion with Apache Phoenix > about > > integrating JDBC and query planning. > > > > == Known Risks == > > > > === Orphaned Products === > > > > Optiq is already a key component in three independent projects, each > backed > > by a different company, so the risk of being orphaned is relatively low. > We > > plan to mitigate this risk by recruiting additional committers, and > > promoting Optiq's adoption as a framework by other projects. > > > > === Inexperience with Open Source === > > > > The initial committers are all Apache members, some of whom have several > > years in the Apache Hadoop community. The founder of the project, Julian > > Hyde, has been a founder and key developer in open source projects for > over > > ten years. > > > > === Homogenous Developers === > > > > The initial committers are employed by a number of companies, including > > Concurrent, Hortonworks, MapR Technologies and Salesforce.com. We are > > committed to recruiting additional committers from outside these > companies. > > > > === Reliance on Salaried Developers === > > > > Like most open source projects, Optiq receives substantial support from > > salaried developers. This is to be expected given that it is a highly > > technical framework. However, they are all passionate about the project, > > and we are confident that the project will continue even if no salaried > > developers contribute to the project. As a framework, the project > > encourages the involvement of members of other projects, and of academic > > researchers. We are committed to recruiting additional committers > including > > non-salaried developers. > > > > === Relationships with Other Apache Products === > > > > As mentioned in the Alignment section, Optiq is being used by Apache Hive > > and Apache Drill, and has adapters for Apache Phoenix and Apache Spark. > > Optiq often operates on data in a Hadoop environment, so collaboration > with > > other Hadoop projects is desirable and highly likely. > > > > Unsurprisingly there is some overlap in capabilities between Optiq and > > other Apache projects. Several projects that are databases or > database-like > > have query-planning capabilities. These include Hive, Drill, Phoenix, > > Spark, Apache Derby, Apache Pig, Apache Jena and Apache Tajo. Optiq's > query > > planner is extensible at run time, and does not have a preferred runtime > > engine on which to execute compiled queries. These capabilities, and the > > large corpus of pre-built rules, are what allow Optiq to be embedded in > > other projects. > > > > Several other Apache projects access third-party data sources, including > > Hive, Pig, Drill, Spark and Apache MetaModel. Optiq allows users to > > optimize access to third-party data sources by writing rules to push > > processing down to the data source, and provide a cost model to choose > the > > optimal location for processing. That said, maintaining a library of > > adapters is burdensome, and so it would make sense to collaborate with > > other projects on adapter libraries, and re-use libraries where possible. > > > > Optiq supports several front ends for submitting queries. The most > popular > > is SQL, with driver connectivity via JDBC (and ODBC planned). Other > Apache > > projects with a SQL parser include Hive, Spark, Phoenix, Derby, Tajo. > Drill > > uses Optiq's parser and JDBC stack; both Phoenix and Drill have expressed > > interest in collaborating on JDBC and ODBC. Optiq's Linq4j API is similar > > to the fluent query-builder APIs in Spark and MetaModel. Use of a front > end > > is not required; for instance, Hive integrates with Optiq by directly > > building a graph of RelNode objects. > > > > === An Excessive Fascination with the Apache Brand === > > > > Optiq solves a real problem, as evidenced by its take-up by other > projects. > > This proposal is not for the purpose of generating publicity. Rather, the > > primary benefits to joining Apache are those outlined in the Rationale > > section. > > > > == Documentation == > > > > Additional documentation for Optiq may be found on its github site: > > > > * [[https://github.com/julianhyde/optiq/blob/master/README.md|Overview]] > > * [[ > > https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tutorial > ]] > > * [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO]] > > * [[ > https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Referenceguide > ]] > > > > ==== Presentation ==== > > > > *[[ > > > https://github.com/julianhyde/share/blob/master/slides/optiq-richrelevance-2013.pdf?raw=true| > > SQL on Big Data using Optiq]] > > == Initial Source == > > > > The initial code codebase resides in three projects, all hosted on > github: > > > > * https://github.com/julianhyde/optiq > > * https://github.com/julianhyde/optiq-csv > > * https://github.com/julianhyde/linq4j > > > > === Source and Intellectual Property Submission Plan === > > > > The initial codebase is already distributed under the Apache 2.0 License. > > The owners of the IP have indicated willingness to sign the SGA. > > > > === External Dependencies === > > > > Optiq and Linq4j have the following external dependencies. > > > > * Java 1.6, 1.7 or 1.8 > > * Apache Maven, Commons > > * JavaCC (BSD license) > > * Sqlline 1.1.6 (BSD license) > > * Junit 4.11 (EPL) > > * Janino (BSD license) > > * Guava (Apache 2.0 license) > > * Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.0 > > license) > > > > Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark, > > optiq-splunk) are currently developed alongside core Optiq, and have the > > following additional dependencies: > > > > * Open CSV 2.3 (Apache 2.0 license) > > * Apache Incubator Spark > > * Mongo Java driver (Apache 2.0 license) > > Upon acceptance to the incubator, we would begin a thorough analysis of > all > > transitive dependencies to verify this information and introduce license > > checking into the build and release process by integrating with Apache > Rat. > > > > === Cryptography === > > > > Optiq will eventually support encryption on the wire. This is not one of > > the initial goals, and we do not expect Optiq to be a controlled export > > item due to the use of encryption. > > > > == Required Resources == > > > > === Mailing Lists === > > > > * private@optiq.incubator.apache.org > > * dev@optiq.incubator.apache.org (will be migrated from > > optiq-dev@googlegroups.com) > > * commits@optiq.incubator.apache.org > > > > === Source control === > > > > The Optiq team would like to use git for source control, due to our > current > > use of git/github. We request a writeable git repo git:// > > git.apache.org/incubator-optiq, and mirroring to be set up to github > > through INFRA. > > > > === Issue Tracking === > > > > Optiq currently uses the github issue tracking system associated with its > > github repo: https://github.com/julianhyde/optiq/issues. We will > migrate to > > the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ. > > > > == Initial Committers == > > > > * Julian Hyde (jhyde at apache dot org) > > * Jacques Nadeau (jacques at apache dot org) > > * James R. Taylor (jamestaylor at apache dot org) > > * Chris Wensel (cwensel at apache dot org) > > > > === Affiliations === > > > > The initial committers are employees of Concurrent, Hortonworks, MapR and > > Salesforce.com. > > > > * Julian Hyde (Hortonworks) > > * Jacques Nadeau (MapR Technologies) > > * James R. Taylor (Salesforce.com) > > * Chris Wensel (Concurrent) > > > > == Sponsors == > > > > === Champion === > > > > * Ashutosh Chauhan (hashutosh at apache dot org) > > > > === Nominated Mentors === > > > > * Ted Dunning (tdunning at apache dot org) - Chief Application Architect > > at MapR Technologies; committer for Lucene, Mahout and ZooKeeper. > > * Alan Gates (gates at apache dot org) - Architect at Hortonworks; > > committer for Pig, Hive and others. > > * Steven Noels (stevenn at apache dot org) - Chief Technical Officer at > > NGDATA; committer for Cocoon and Forrest, mentor for Phoenix. > > > > === Sponsoring Entity === > > > > The Apache Incubator. > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity to > which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --bcaec554004063147104f9388852--