incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kasper Sørensen <kasper.soren...@HumanInference.com>
Subject RE: [PROPOSAL] Optiq
Date Thu, 15 May 2014 19:45:30 GMT
Good section. I do agree to what it says and somehow hope we can eventually help each other
out with e.g. a library of adaptors.

-----Original Message-----
From: Julian Hyde [mailto:julianhyde@gmail.com] 
Sent: 8. maj 2014 20:03
To: general@incubator.apache.org
Subject: Re: [PROPOSAL] Optiq

The "Relationships with Other Apache Products" section has been updated to cover Optiq's functional
overlaps with existing Apache projects.

https://wiki.apache.org/incubator/OptiqProposal#Relationships_with_Other_Apache_Products

Julian

On May 2, 2014, at 11:23 AM, Henry Saputra <henry.saputra@gmail.com> wrote:

> Ah sorry, I did not mean "asking to update", I meant "proposing to update".
> 
> Thanks,
> 
> - Henry
> 
> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra <henry.saputra@gmail.com> wrote:
>> HI Ashutosh,
>> 
>> Since there was a question/ comment about relationship with Apache 
>> MetaModel, I am asking to update the proposal to include this 
>> discussion in either "Relationships with Other Apache Products" or 
>> "Alignment" section before going for a VOTE.
>> 
>> Apache Slider did the same thing with relation to Apache Twill and 
>> Apache Helix projects.
>> 
>> Thanks,
>> 
>> - Henry
>> 
>> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan <hashutosh@apache.org> wrote:
>>> I would like to propose Optiq as an Apache Incubator project.  I 
>>> have posted the proposal to 
>>> https://wiki.apache.org/incubator/OptiqProposal and posted the text of the proposal
below.
>>> 
>>> Ashutosh.
>>> 
>>> = Optiq =
>>> == Abstract ==
>>> 
>>> Optiq is a framework that allows efficient translation of queries 
>>> involving heterogeneous and federated data.
>>> 
>>> == Proposal ==
>>> 
>>> Optiq is a highly customizable engine for parsing and planning 
>>> queries on data in a wide variety of formats. It allows 
>>> database-like access, and in particular a SQL interface and advanced 
>>> query optimization, for data not residing in a traditional database.
>>> 
>>> == Background ==
>>> 
>>> Databases were traditionally engineered in a monolithic stack, 
>>> providing a data storage format, data processing algorithms, query 
>>> parser, query planner, built-in functions, metadata repository and connectivity
layer.
>>> They innovate in some areas but rarely in all.
>>> 
>>> Modern data management systems are decomposing that stack into 
>>> separate components, separating data, processing engine, metadata, 
>>> and query language support. They are highly heterogeneous, with data 
>>> in multiple locations and formats, caching and redundant data, 
>>> different workloads, and processing occurring in different engines.
>>> 
>>> Query planning (sometimes called query optimization) has always been 
>>> a key function of a DBMS, because it allows the implementors to 
>>> introduce new query-processing algorithms, and allows data 
>>> administrators to re-organize the data without affecting 
>>> applications built on that data. In a componentized system, the 
>>> query planner integrates the components (data formats, engines, 
>>> algorithms) without introducing unncessary coupling or performance tradeoffs.
>>> 
>>> But building a query planner is hard; many systems muddle along 
>>> without a planner, and indeed a SQL interface, until the demand from 
>>> their customers is overwhelming.
>>> 
>>> There is an opportunity to make this process more efficient by 
>>> creating a re-usable framework.
>>> 
>>> == Rationale ==
>>> 
>>> Optiq allows database-like access, and in particular a SQL interface 
>>> and advanced query optimization, for data not residing in a 
>>> traditional database. It is complementary to many current Hadoop and 
>>> NoSQL systems, which have innovative and performant storage and 
>>> runtime systems but lack a SQL interface and intelligent query translation.
>>> 
>>> Optiq is already in use by several projects, including Apache Drill, 
>>> Apache Hive and Cascading Lingual, and commercial products.
>>> 
>>> Optiq's architecture consists of:
>>> 
>>> An extensible relational algebra.
>>> SPIs (service-provider interfaces) for metadata (schemas and 
>>> tables), planner rules, statistics, cost-estimates, user-defined functions.
>>> Built-in sets of rules for logical transformations and common data-sources.
>>> Two query planning engines driven by rules, statistics, etc. One 
>>> engine is cost-based, the other rule-based.
>>> Optional SQL parser, validator and translator to relational algebra.
>>> Optional JDBC driver.
>>> == Initial Goals ==
>>> 
>>> The initial goals are be to move the existing codebase to Apache and 
>>> integrate with the Apache development process. Once this is 
>>> accomplished, we plan for incremental development and releases that 
>>> follow the Apache guidelines.
>>> 
>>> As we move the code into the org.apache namespace, we will 
>>> restructure components as necessary to allow clients to use just the 
>>> components of Optiq that they need.
>>> 
>>> A version 1.0 release, including pre-built binaries, will foster 
>>> wider adoption.
>>> 
>>> == Current Status ==
>>> 
>>> Optiq has had over a dozen minor releases over the last 18 months. 
>>> Its core SQL parser and validator, and its planning engine and core 
>>> rules, are mature and robust and are the basis for several 
>>> production systems; but other components and SPIs are still undergoing rapid
evolution.
>>> 
>>> === Meritocracy ===
>>> 
>>> We plan to invest in supporting a meritocracy. We will discuss the 
>>> requirements in an open forum. We encourage the companies and 
>>> projects using Optiq to discuss their requirements in an open forum 
>>> and to participate in development. We will encourage and monitor 
>>> community participation so that privileges can be extended to those that contribute.
>>> 
>>> Optiq's pluggable architecture encourages developers to contribute 
>>> extensions such as adapters for data sources, new planning rules, 
>>> and better statistics and cost-estimation functions. We look forward 
>>> to fostering a rich ecosystem of extensions.
>>> 
>>> === Community ===
>>> 
>>> Building a data management system requires a high degree of 
>>> technical skill, and correspondingly, the community of developers 
>>> directly using Optiq is potentially fairly small, albeit highly 
>>> technical and engaged. But we also expect engagement from members of 
>>> the communities of projects that use Optiq, such as Drill and Hive. 
>>> And we intend to structure Optiq so that it can be used for lighter 
>>> weight applications, such as providing a SQL and JDBC interface to a NoSQL system.
>>> 
>>> === Core Developers ===
>>> 
>>> The developers on the initial committers list are all experienced 
>>> open source developers, and are actively using Optiq in their projects.
>>> 
>>> * Julian Hyde is lead developer of Mondrian, an open source OLAP 
>>> engine, and an Apache Drill committer.
>>> * Chris Wensel is lead developer of Cascading, and of Lingual, the 
>>> SQL interface to Cascading built using Optiq.
>>> * Jacques Nadeau is lead developer of Apache Drill, which uses Optiq.
>>> 
>>> In addition, there are several regular contributors whom we hope 
>>> will graduate to committers during the incubation process.
>>> 
>>> We realize that additional employer diversity is needed, and we will 
>>> work aggressively to recruit developers from additional companies.
>>> 
>>> === Alignment ===
>>> 
>>> Apache, and in particular the ecosystem surrounding Hadoop, contains 
>>> several projects for building data management systems that leverage 
>>> each other's capabilities. Optiq is a natural fit for that 
>>> ecosystem, and will help foster projects meeting new challenges.
>>> 
>>> Optiq is already used by Apache Hive and Apache Drill; Optiq embeds 
>>> Apache Spark as an optional engine; we are in discussion with Apache 
>>> Phoenix about integrating JDBC and query planning.
>>> 
>>> == Known Risks ==
>>> 
>>> === Orphaned Products ===
>>> 
>>> Optiq is already a key component in three independent projects, each 
>>> backed by a different company, so the risk of being orphaned is 
>>> relatively low. We plan to mitigate this risk by recruiting 
>>> additional committers, and promoting Optiq's adoption as a framework by other
projects.
>>> 
>>> === Inexperience with Open Source ===
>>> 
>>> The initial committers are all Apache members, some of whom have 
>>> several years in the Apache Hadoop community. The founder of the 
>>> project, Julian Hyde, has been a founder and key developer in open 
>>> source projects for over ten years.
>>> 
>>> === Homogenous Developers ===
>>> 
>>> The initial committers are employed by a number of companies, 
>>> including Concurrent, Hortonworks, MapR Technologies and 
>>> Salesforce.com. We are committed to recruiting additional committers from outside
these companies.
>>> 
>>> === Reliance on Salaried Developers ===
>>> 
>>> Like most open source projects, Optiq receives substantial support 
>>> from salaried developers. This is to be expected given that it is a 
>>> highly technical framework. However, they are all passionate about 
>>> the project, and we are confident that the project will continue 
>>> even if no salaried developers contribute to the project. As a 
>>> framework, the project encourages the involvement of members of 
>>> other projects, and of academic researchers. We are committed to 
>>> recruiting additional committers including non-salaried developers.
>>> 
>>> === Relationships with Other Apache Products ===
>>> 
>>> As mentioned in the Alignment section, Optiq is being used by Apache 
>>> Hive and Apache Drill, and has adapters for Apache Phoenix and Apache Spark.
>>> Optiq often operates on data in a Hadoop environment, so 
>>> collaboration with other Hadoop projects is desirable and highly likely.
>>> 
>>> === An Excessive Fascination with the Apache Brand ===
>>> 
>>> Optiq solves a real problem, as evidenced by its take-up by other projects.
>>> This proposal is not for the purpose of generating publicity. 
>>> Rather, the primary benefits to joining Apache are those outlined in 
>>> the Rationale section.
>>> 
>>> == Documentation ==
>>> 
>>> Additional documentation for Optiq may be found on its github site:
>>> 
>>> * 
>>> [[https://github.com/julianhyde/optiq/blob/master/README.md|Overview
>>> ]]
>>> * [[
>>> https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tuto
>>> rial]]
>>> * [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO]]
>>> * 
>>> [[https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Refer
>>> enceguide]]
>>> 
>>> === Presentation: ===
>>> 
>>> *[[
>>> https://github.com/julianhyde/share/blob/master/slides/optiq-richrel
>>> evance-2013.pdf?raw=true|
>>> SQL on Big Data using Optiq]]
>>> == Initial Source ==
>>> 
>>> The initial code codebase resides in three projects, all hosted on github:
>>> 
>>> * https://github.com/julianhyde/optiq
>>> * https://github.com/julianhyde/optiq-csv
>>> * https://github.com/julianhyde/linq4j
>>> 
>>> === Source and Intellectual Property Submission Plan ===
>>> 
>>> The initial codebase is already distributed under the Apache 2.0 License.
>>> The owners of the IP have indicated willingness to sign the SGA.
>>> 
>>> === External Dependencies ===
>>> 
>>> Optiq and Linq4j have the following external dependencies.
>>> 
>>> * Java 1.6, 1.7 or 1.8
>>> * Apache Maven, Commons
>>> * JavaCC (BSD license)
>>> * Sqlline 1.1.6 (BSD license)
>>> * Junit 4.11 (EPL)
>>> * Janino (BSD license)
>>> * Guava (Apache 2.0 license)
>>> * Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.0
>>> license)
>>> 
>>> Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark,
>>> optiq-splunk) are currently developed alongside core Optiq, and have 
>>> the following additional dependencies:
>>> 
>>> * Open CSV 2.3 (Apache 2.0 license)
>>> * Apache Incubator Spark
>>> * Mongo Java driver (Apache 2.0 license) Upon acceptance to the 
>>> incubator, we would begin a thorough analysis of all transitive 
>>> dependencies to verify this information and introduce license 
>>> checking into the build and release process by integrating with Apache Rat.
>>> 
>>> === Cryptography ===
>>> 
>>> Optiq will eventually support encryption on the wire. This is not 
>>> one of the initial goals, and we do not expect Optiq to be a 
>>> controlled export item due to the use of encryption.
>>> 
>>> == Required Resources ==
>>> 
>>> === Mailing Lists ===
>>> 
>>> * private@optiq.incubator.apache.org
>>> * dev@optiq.incubator.apache.org (will be migrated from
>>> optiq-dev@googlegroups.com)
>>> * commits@optiq.incubator.apache.org
>>> 
>>> === Source control ===
>>> 
>>> The Optiq team would like to use git for source control, due to our 
>>> current use of git/github. We request a writeable git repo git:// 
>>> git.apache.org/incubator-optiq, and mirroring to be set up to github 
>>> through INFRA.
>>> 
>>> === Issue Tracking ===
>>> 
>>> Optiq currently uses the github issue tracking system associated 
>>> with its github repo: https://github.com/julianhyde/optiq/issues. We 
>>> will migrate to the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ.
>>> 
>>> == Initial Committers ==
>>> 
>>> * Julian Hyde (jhyde at apache dot org)
>>> * Jacques Nadeau (jacques at apache dot org)
>>> * James R. Taylor (jamestaylor at apache dot org)
>>> * Chris Wensel (cwensel at apache dot org)
>>> 
>>> === Affiliations ===
>>> 
>>> The initial committers are employees of Concurrent, Hortonworks, 
>>> MapR and Salesforce.com.
>>> 
>>> * Julian Hyde (Hortonworks)
>>> * Jacques Nadeau (MapR Technologies)
>>> * James R. Taylor (Salesforce.com)
>>> * Chris Wensel (Concurrent)
>>> 
>>> == Sponsors ==
>>> 
>>> === Champion ===
>>> 
>>> * Ashutosh Chauhan (hashutosh at apache dot org)
>>> 
>>> === Nominated Mentors ===
>>> 
>>> * Ted Dunning (tdunning at apache dot org) - Chief Application 
>>> Architect at MapR Technologies; committer for Lucene, Mahout and ZooKeeper.
>>> * Alan Gates (gates at apache dot org) - Architect at Hortonworks; 
>>> committer for Pig, Hive and others.
>>> * Steven Noels (stevenn at apache dot org) - Chief Technical Officer 
>>> at NGDATA; committer for Cocoon and Forrest, mentor for Phoenix.
>>> 
>>> === Sponsoring Entity ===
>>> 
>>> The Apache Incubator.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message