incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: [PROPOSAL] Optiq
Date Fri, 09 May 2014 17:58:46 GMT
Now that discussion is settling down, I will start a vote thread shortly.


On Mon, May 5, 2014 at 3:22 PM, Ashutosh Chauhan <hashutosh@apache.org>wrote:

> Thanks everyone for great feedback. With Julian's help I have updated the
> section "Relationships with Other Apache projects" so that folks can  get a
> sense where Optiq stands w.r.t other projects going on at ASF.
>
> Thanks,
> Ashutosh
>
>
> On Fri, May 2, 2014 at 11:23 AM, Henry Saputra <henry.saputra@gmail.com>wrote:
>
>> Ah sorry, I did not mean "asking to update", I meant "proposing to
>> update".
>>
>> Thanks,
>>
>> - Henry
>>
>> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra <henry.saputra@gmail.com>
>> wrote:
>> > HI Ashutosh,
>> >
>> > Since there was a question/ comment about relationship with Apache
>> > MetaModel, I am asking to update the proposal to include this
>> > discussion in either "Relationships with Other Apache Products" or
>> > "Alignment" section before going for a VOTE.
>> >
>> > Apache Slider did the same thing with relation to Apache Twill and
>> > Apache Helix projects.
>> >
>> > Thanks,
>> >
>> > - Henry
>> >
>> > On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan <hashutosh@apache.org>
>> wrote:
>> >> I would like to propose Optiq as an Apache Incubator project.  I have
>> >> posted the proposal to https://wiki.apache.org/incubator/OptiqProposaland
>> >> posted the text of the proposal below.
>> >>
>> >> Ashutosh.
>> >>
>> >> = Optiq =
>> >> == Abstract ==
>> >>
>> >> Optiq is a framework that allows efficient translation of queries
>> involving
>> >> heterogeneous and federated data.
>> >>
>> >> == Proposal ==
>> >>
>> >> Optiq is a highly customizable engine for parsing and planning queries
>> on
>> >> data in a wide variety of formats. It allows database-like access, and
>> in
>> >> particular a SQL interface and advanced query optimization, for data
>> not
>> >> residing in a traditional database.
>> >>
>> >> == Background ==
>> >>
>> >> Databases were traditionally engineered in a monolithic stack,
>> providing a
>> >> data storage format, data processing algorithms, query parser, query
>> >> planner, built-in functions, metadata repository and connectivity
>> layer.
>> >> They innovate in some areas but rarely in all.
>> >>
>> >> Modern data management systems are decomposing that stack into separate
>> >> components, separating data, processing engine, metadata, and query
>> >> language support. They are highly heterogeneous, with data in multiple
>> >> locations and formats, caching and redundant data, different
>> workloads, and
>> >> processing occurring in different engines.
>> >>
>> >> Query planning (sometimes called query optimization) has always been a
>> key
>> >> function of a DBMS, because it allows the implementors to introduce new
>> >> query-processing algorithms, and allows data administrators to
>> re-organize
>> >> the data without affecting applications built on that data. In a
>> >> componentized system, the query planner integrates the components (data
>> >> formats, engines, algorithms) without introducing unncessary coupling
>> or
>> >> performance tradeoffs.
>> >>
>> >> But building a query planner is hard; many systems muddle along
>> without a
>> >> planner, and indeed a SQL interface, until the demand from their
>> customers
>> >> is overwhelming.
>> >>
>> >> There is an opportunity to make this process more efficient by
>> creating a
>> >> re-usable framework.
>> >>
>> >> == Rationale ==
>> >>
>> >> Optiq allows database-like access, and in particular a SQL interface
>> and
>> >> advanced query optimization, for data not residing in a traditional
>> >> database. It is complementary to many current Hadoop and NoSQL systems,
>> >> which have innovative and performant storage and runtime systems but
>> lack a
>> >> SQL interface and intelligent query translation.
>> >>
>> >> Optiq is already in use by several projects, including Apache Drill,
>> Apache
>> >> Hive and Cascading Lingual, and commercial products.
>> >>
>> >> Optiq's architecture consists of:
>> >>
>> >> An extensible relational algebra.
>> >> SPIs (service-provider interfaces) for metadata (schemas and tables),
>> >> planner rules, statistics, cost-estimates, user-defined functions.
>> >> Built-in sets of rules for logical transformations and common
>> data-sources.
>> >> Two query planning engines driven by rules, statistics, etc. One
>> engine is
>> >> cost-based, the other rule-based.
>> >> Optional SQL parser, validator and translator to relational algebra.
>> >> Optional JDBC driver.
>> >> == Initial Goals ==
>> >>
>> >> The initial goals are be to move the existing codebase to Apache and
>> >> integrate with the Apache development process. Once this is
>> accomplished,
>> >> we plan for incremental development and releases that follow the Apache
>> >> guidelines.
>> >>
>> >> As we move the code into the org.apache namespace, we will restructure
>> >> components as necessary to allow clients to use just the components of
>> >> Optiq that they need.
>> >>
>> >> A version 1.0 release, including pre-built binaries, will foster wider
>> >> adoption.
>> >>
>> >> == Current Status ==
>> >>
>> >> Optiq has had over a dozen minor releases over the last 18 months. Its
>> core
>> >> SQL parser and validator, and its planning engine and core rules, are
>> >> mature and robust and are the basis for several production systems; but
>> >> other components and SPIs are still undergoing rapid evolution.
>> >>
>> >> === Meritocracy ===
>> >>
>> >> We plan to invest in supporting a meritocracy. We will discuss the
>> >> requirements in an open forum. We encourage the companies and projects
>> >> using Optiq to discuss their requirements in an open forum and to
>> >> participate in development. We will encourage and monitor community
>> >> participation so that privileges can be extended to those that
>> contribute.
>> >>
>> >> Optiq's pluggable architecture encourages developers to contribute
>> >> extensions such as adapters for data sources, new planning rules, and
>> >> better statistics and cost-estimation functions. We look forward to
>> >> fostering a rich ecosystem of extensions.
>> >>
>> >> === Community ===
>> >>
>> >> Building a data management system requires a high degree of technical
>> >> skill, and correspondingly, the community of developers directly using
>> >> Optiq is potentially fairly small, albeit highly technical and
>> engaged. But
>> >> we also expect engagement from members of the communities of projects
>> that
>> >> use Optiq, such as Drill and Hive. And we intend to structure Optiq so
>> that
>> >> it can be used for lighter weight applications, such as providing a
>> SQL and
>> >> JDBC interface to a NoSQL system.
>> >>
>> >> === Core Developers ===
>> >>
>> >> The developers on the initial committers list are all experienced open
>> >> source developers, and are actively using Optiq in their projects.
>> >>
>> >>  * Julian Hyde is lead developer of Mondrian, an open source OLAP
>> engine,
>> >> and an Apache Drill committer.
>> >>  * Chris Wensel is lead developer of Cascading, and of Lingual, the SQL
>> >> interface to Cascading built using Optiq.
>> >>  * Jacques Nadeau is lead developer of Apache Drill, which uses Optiq.
>> >>
>> >> In addition, there are several regular contributors whom we hope will
>> >> graduate to committers during the incubation process.
>> >>
>> >> We realize that additional employer diversity is needed, and we will
>> work
>> >> aggressively to recruit developers from additional companies.
>> >>
>> >> === Alignment ===
>> >>
>> >> Apache, and in particular the ecosystem surrounding Hadoop, contains
>> >> several projects for building data management systems that leverage
>> each
>> >> other's capabilities. Optiq is a natural fit for that ecosystem, and
>> will
>> >> help foster projects meeting new challenges.
>> >>
>> >> Optiq is already used by Apache Hive and Apache Drill; Optiq embeds
>> Apache
>> >> Spark as an optional engine; we are in discussion with Apache Phoenix
>> about
>> >> integrating JDBC and query planning.
>> >>
>> >> == Known Risks ==
>> >>
>> >> === Orphaned Products ===
>> >>
>> >> Optiq is already a key component in three independent projects, each
>> backed
>> >> by a different company, so the risk of being orphaned is relatively
>> low. We
>> >> plan to mitigate this risk by recruiting additional committers, and
>> >> promoting Optiq's adoption as a framework by other projects.
>> >>
>> >> === Inexperience with Open Source ===
>> >>
>> >> The initial committers are all Apache members, some of whom have
>> several
>> >> years in the Apache Hadoop community. The founder of the project,
>> Julian
>> >> Hyde, has been a founder and key developer in open source projects for
>> over
>> >> ten years.
>> >>
>> >> === Homogenous Developers ===
>> >>
>> >> The initial committers are employed by a number of companies, including
>> >> Concurrent, Hortonworks, MapR Technologies and Salesforce.com. We are
>> >> committed to recruiting additional committers from outside these
>> companies.
>> >>
>> >> === Reliance on Salaried Developers ===
>> >>
>> >> Like most open source projects, Optiq receives substantial support from
>> >> salaried developers. This is to be expected given that it is a highly
>> >> technical framework. However, they are all passionate about the
>> project,
>> >> and we are confident that the project will continue even if no salaried
>> >> developers contribute to the project. As a framework, the project
>> >> encourages the involvement of members of other projects, and of
>> academic
>> >> researchers. We are committed to recruiting additional committers
>> including
>> >> non-salaried developers.
>> >>
>> >> === Relationships with Other Apache Products ===
>> >>
>> >> As mentioned in the Alignment section, Optiq is being used by Apache
>> Hive
>> >> and Apache Drill, and has adapters for Apache Phoenix and Apache Spark.
>> >> Optiq often operates on data in a Hadoop environment, so collaboration
>> with
>> >> other Hadoop projects is desirable and highly likely.
>> >>
>> >> === An Excessive Fascination with the Apache Brand ===
>> >>
>> >> Optiq solves a real problem, as evidenced by its take-up by other
>> projects.
>> >> This proposal is not for the purpose of generating publicity. Rather,
>> the
>> >> primary benefits to joining Apache are those outlined in the Rationale
>> >> section.
>> >>
>> >> == Documentation ==
>> >>
>> >> Additional documentation for Optiq may be found on its github site:
>> >>
>> >>  * [[
>> https://github.com/julianhyde/optiq/blob/master/README.md|Overview]]
>> >>  * [[
>> >>
>> https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tutorial
>> ]]
>> >>  * [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO]]
>> >>  * [[
>> https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Referenceguide
>> ]]
>> >>
>> >> === Presentation: ===
>> >>
>> >>  *[[
>> >>
>> https://github.com/julianhyde/share/blob/master/slides/optiq-richrelevance-2013.pdf?raw=true|
>> >> SQL on Big Data using Optiq]]
>> >> == Initial Source ==
>> >>
>> >> The initial code codebase resides in three projects, all hosted on
>> github:
>> >>
>> >>  * https://github.com/julianhyde/optiq
>> >>  * https://github.com/julianhyde/optiq-csv
>> >>  * https://github.com/julianhyde/linq4j
>> >>
>> >> === Source and Intellectual Property Submission Plan ===
>> >>
>> >> The initial codebase is already distributed under the Apache 2.0
>> License.
>> >> The owners of the IP have indicated willingness to sign the SGA.
>> >>
>> >> === External Dependencies ===
>> >>
>> >> Optiq and Linq4j have the following external dependencies.
>> >>
>> >>  * Java 1.6, 1.7 or 1.8
>> >>  * Apache Maven, Commons
>> >>  * JavaCC (BSD license)
>> >>  * Sqlline 1.1.6 (BSD license)
>> >>  * Junit 4.11 (EPL)
>> >>  * Janino (BSD license)
>> >>  * Guava (Apache 2.0 license)
>> >>  * Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.0
>> >> license)
>> >>
>> >> Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark,
>> >> optiq-splunk) are currently developed alongside core Optiq, and have
>> the
>> >> following additional dependencies:
>> >>
>> >>  * Open CSV 2.3 (Apache 2.0 license)
>> >>  * Apache Incubator Spark
>> >>  * Mongo Java driver (Apache 2.0 license)
>> >> Upon acceptance to the incubator, we would begin a thorough analysis
>> of all
>> >> transitive dependencies to verify this information and introduce
>> license
>> >> checking into the build and release process by integrating with Apache
>> Rat.
>> >>
>> >> === Cryptography ===
>> >>
>> >> Optiq will eventually support encryption on the wire. This is not one
>> of
>> >> the initial goals, and we do not expect Optiq to be a controlled export
>> >> item due to the use of encryption.
>> >>
>> >> == Required Resources ==
>> >>
>> >> === Mailing Lists ===
>> >>
>> >>  * private@optiq.incubator.apache.org
>> >>  * dev@optiq.incubator.apache.org (will be migrated from
>> >> optiq-dev@googlegroups.com)
>> >>  * commits@optiq.incubator.apache.org
>> >>
>> >> === Source control ===
>> >>
>> >> The Optiq team would like to use git for source control, due to our
>> current
>> >> use of git/github. We request a writeable git repo git://
>> >> git.apache.org/incubator-optiq, and mirroring to be set up to github
>> >> through INFRA.
>> >>
>> >> === Issue Tracking ===
>> >>
>> >> Optiq currently uses the github issue tracking system associated with
>> its
>> >> github repo: https://github.com/julianhyde/optiq/issues. We will
>> migrate to
>> >> the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ.
>> >>
>> >> == Initial Committers ==
>> >>
>> >>  * Julian Hyde (jhyde at apache dot org)
>> >>  * Jacques Nadeau (jacques at apache dot org)
>> >>  * James R. Taylor (jamestaylor at apache dot org)
>> >>  * Chris Wensel (cwensel at apache dot org)
>> >>
>> >> === Affiliations ===
>> >>
>> >> The initial committers are employees of Concurrent, Hortonworks, MapR
>> and
>> >> Salesforce.com.
>> >>
>> >>  * Julian Hyde (Hortonworks)
>> >>  * Jacques Nadeau (MapR Technologies)
>> >>  * James R. Taylor (Salesforce.com)
>> >>  * Chris Wensel (Concurrent)
>> >>
>> >> == Sponsors ==
>> >>
>> >> === Champion ===
>> >>
>> >>  * Ashutosh Chauhan (hashutosh at apache dot org)
>> >>
>> >> === Nominated Mentors ===
>> >>
>> >>  * Ted Dunning (tdunning at apache dot org) - Chief Application
>> Architect
>> >> at MapR Technologies; committer for Lucene, Mahout and ZooKeeper.
>> >>  * Alan Gates (gates at apache dot org) - Architect at Hortonworks;
>> >> committer for Pig, Hive and others.
>> >>  * Steven Noels (stevenn at apache dot org) - Chief Technical Officer
>> at
>> >> NGDATA; committer for Cocoon and Forrest, mentor for Phoenix.
>> >>
>> >> === Sponsoring Entity ===
>> >>
>> >> The Apache Incubator.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message