incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: [PROPOSAL] Optiq
Date Thu, 15 May 2014 16:50:17 GMT
Yes, thanks updating the proposal. Really appreciate it.

- Henry

On Thu, May 8, 2014 at 11:03 AM, Julian Hyde <julianhyde@gmail.com> wrote:
> The “Relationships with Other Apache Products” section has been updated to cover
Optiq’s functional overlaps with existing Apache projects.
>
> https://wiki.apache.org/incubator/OptiqProposal#Relationships_with_Other_Apache_Products
>
> Julian
>
> On May 2, 2014, at 11:23 AM, Henry Saputra <henry.saputra@gmail.com> wrote:
>
>> Ah sorry, I did not mean "asking to update", I meant "proposing to update".
>>
>> Thanks,
>>
>> - Henry
>>
>> On Fri, May 2, 2014 at 11:20 AM, Henry Saputra <henry.saputra@gmail.com> wrote:
>>> HI Ashutosh,
>>>
>>> Since there was a question/ comment about relationship with Apache
>>> MetaModel, I am asking to update the proposal to include this
>>> discussion in either "Relationships with Other Apache Products" or
>>> "Alignment" section before going for a VOTE.
>>>
>>> Apache Slider did the same thing with relation to Apache Twill and
>>> Apache Helix projects.
>>>
>>> Thanks,
>>>
>>> - Henry
>>>
>>> On Wed, Apr 30, 2014 at 3:21 PM, Ashutosh Chauhan <hashutosh@apache.org>
wrote:
>>>> I would like to propose Optiq as an Apache Incubator project.  I have
>>>> posted the proposal to https://wiki.apache.org/incubator/OptiqProposal and
>>>> posted the text of the proposal below.
>>>>
>>>> Ashutosh.
>>>>
>>>> = Optiq =
>>>> == Abstract ==
>>>>
>>>> Optiq is a framework that allows efficient translation of queries involving
>>>> heterogeneous and federated data.
>>>>
>>>> == Proposal ==
>>>>
>>>> Optiq is a highly customizable engine for parsing and planning queries on
>>>> data in a wide variety of formats. It allows database-like access, and in
>>>> particular a SQL interface and advanced query optimization, for data not
>>>> residing in a traditional database.
>>>>
>>>> == Background ==
>>>>
>>>> Databases were traditionally engineered in a monolithic stack, providing
a
>>>> data storage format, data processing algorithms, query parser, query
>>>> planner, built-in functions, metadata repository and connectivity layer.
>>>> They innovate in some areas but rarely in all.
>>>>
>>>> Modern data management systems are decomposing that stack into separate
>>>> components, separating data, processing engine, metadata, and query
>>>> language support. They are highly heterogeneous, with data in multiple
>>>> locations and formats, caching and redundant data, different workloads, and
>>>> processing occurring in different engines.
>>>>
>>>> Query planning (sometimes called query optimization) has always been a key
>>>> function of a DBMS, because it allows the implementors to introduce new
>>>> query-processing algorithms, and allows data administrators to re-organize
>>>> the data without affecting applications built on that data. In a
>>>> componentized system, the query planner integrates the components (data
>>>> formats, engines, algorithms) without introducing unncessary coupling or
>>>> performance tradeoffs.
>>>>
>>>> But building a query planner is hard; many systems muddle along without a
>>>> planner, and indeed a SQL interface, until the demand from their customers
>>>> is overwhelming.
>>>>
>>>> There is an opportunity to make this process more efficient by creating a
>>>> re-usable framework.
>>>>
>>>> == Rationale ==
>>>>
>>>> Optiq allows database-like access, and in particular a SQL interface and
>>>> advanced query optimization, for data not residing in a traditional
>>>> database. It is complementary to many current Hadoop and NoSQL systems,
>>>> which have innovative and performant storage and runtime systems but lack
a
>>>> SQL interface and intelligent query translation.
>>>>
>>>> Optiq is already in use by several projects, including Apache Drill, Apache
>>>> Hive and Cascading Lingual, and commercial products.
>>>>
>>>> Optiq's architecture consists of:
>>>>
>>>> An extensible relational algebra.
>>>> SPIs (service-provider interfaces) for metadata (schemas and tables),
>>>> planner rules, statistics, cost-estimates, user-defined functions.
>>>> Built-in sets of rules for logical transformations and common data-sources.
>>>> Two query planning engines driven by rules, statistics, etc. One engine is
>>>> cost-based, the other rule-based.
>>>> Optional SQL parser, validator and translator to relational algebra.
>>>> Optional JDBC driver.
>>>> == Initial Goals ==
>>>>
>>>> The initial goals are be to move the existing codebase to Apache and
>>>> integrate with the Apache development process. Once this is accomplished,
>>>> we plan for incremental development and releases that follow the Apache
>>>> guidelines.
>>>>
>>>> As we move the code into the org.apache namespace, we will restructure
>>>> components as necessary to allow clients to use just the components of
>>>> Optiq that they need.
>>>>
>>>> A version 1.0 release, including pre-built binaries, will foster wider
>>>> adoption.
>>>>
>>>> == Current Status ==
>>>>
>>>> Optiq has had over a dozen minor releases over the last 18 months. Its core
>>>> SQL parser and validator, and its planning engine and core rules, are
>>>> mature and robust and are the basis for several production systems; but
>>>> other components and SPIs are still undergoing rapid evolution.
>>>>
>>>> === Meritocracy ===
>>>>
>>>> We plan to invest in supporting a meritocracy. We will discuss the
>>>> requirements in an open forum. We encourage the companies and projects
>>>> using Optiq to discuss their requirements in an open forum and to
>>>> participate in development. We will encourage and monitor community
>>>> participation so that privileges can be extended to those that contribute.
>>>>
>>>> Optiq's pluggable architecture encourages developers to contribute
>>>> extensions such as adapters for data sources, new planning rules, and
>>>> better statistics and cost-estimation functions. We look forward to
>>>> fostering a rich ecosystem of extensions.
>>>>
>>>> === Community ===
>>>>
>>>> Building a data management system requires a high degree of technical
>>>> skill, and correspondingly, the community of developers directly using
>>>> Optiq is potentially fairly small, albeit highly technical and engaged. But
>>>> we also expect engagement from members of the communities of projects that
>>>> use Optiq, such as Drill and Hive. And we intend to structure Optiq so that
>>>> it can be used for lighter weight applications, such as providing a SQL and
>>>> JDBC interface to a NoSQL system.
>>>>
>>>> === Core Developers ===
>>>>
>>>> The developers on the initial committers list are all experienced open
>>>> source developers, and are actively using Optiq in their projects.
>>>>
>>>> * Julian Hyde is lead developer of Mondrian, an open source OLAP engine,
>>>> and an Apache Drill committer.
>>>> * Chris Wensel is lead developer of Cascading, and of Lingual, the SQL
>>>> interface to Cascading built using Optiq.
>>>> * Jacques Nadeau is lead developer of Apache Drill, which uses Optiq.
>>>>
>>>> In addition, there are several regular contributors whom we hope will
>>>> graduate to committers during the incubation process.
>>>>
>>>> We realize that additional employer diversity is needed, and we will work
>>>> aggressively to recruit developers from additional companies.
>>>>
>>>> === Alignment ===
>>>>
>>>> Apache, and in particular the ecosystem surrounding Hadoop, contains
>>>> several projects for building data management systems that leverage each
>>>> other's capabilities. Optiq is a natural fit for that ecosystem, and will
>>>> help foster projects meeting new challenges.
>>>>
>>>> Optiq is already used by Apache Hive and Apache Drill; Optiq embeds Apache
>>>> Spark as an optional engine; we are in discussion with Apache Phoenix about
>>>> integrating JDBC and query planning.
>>>>
>>>> == Known Risks ==
>>>>
>>>> === Orphaned Products ===
>>>>
>>>> Optiq is already a key component in three independent projects, each backed
>>>> by a different company, so the risk of being orphaned is relatively low.
We
>>>> plan to mitigate this risk by recruiting additional committers, and
>>>> promoting Optiq's adoption as a framework by other projects.
>>>>
>>>> === Inexperience with Open Source ===
>>>>
>>>> The initial committers are all Apache members, some of whom have several
>>>> years in the Apache Hadoop community. The founder of the project, Julian
>>>> Hyde, has been a founder and key developer in open source projects for over
>>>> ten years.
>>>>
>>>> === Homogenous Developers ===
>>>>
>>>> The initial committers are employed by a number of companies, including
>>>> Concurrent, Hortonworks, MapR Technologies and Salesforce.com. We are
>>>> committed to recruiting additional committers from outside these companies.
>>>>
>>>> === Reliance on Salaried Developers ===
>>>>
>>>> Like most open source projects, Optiq receives substantial support from
>>>> salaried developers. This is to be expected given that it is a highly
>>>> technical framework. However, they are all passionate about the project,
>>>> and we are confident that the project will continue even if no salaried
>>>> developers contribute to the project. As a framework, the project
>>>> encourages the involvement of members of other projects, and of academic
>>>> researchers. We are committed to recruiting additional committers including
>>>> non-salaried developers.
>>>>
>>>> === Relationships with Other Apache Products ===
>>>>
>>>> As mentioned in the Alignment section, Optiq is being used by Apache Hive
>>>> and Apache Drill, and has adapters for Apache Phoenix and Apache Spark.
>>>> Optiq often operates on data in a Hadoop environment, so collaboration with
>>>> other Hadoop projects is desirable and highly likely.
>>>>
>>>> === An Excessive Fascination with the Apache Brand ===
>>>>
>>>> Optiq solves a real problem, as evidenced by its take-up by other projects.
>>>> This proposal is not for the purpose of generating publicity. Rather, the
>>>> primary benefits to joining Apache are those outlined in the Rationale
>>>> section.
>>>>
>>>> == Documentation ==
>>>>
>>>> Additional documentation for Optiq may be found on its github site:
>>>>
>>>> * [[https://github.com/julianhyde/optiq/blob/master/README.md|Overview]]
>>>> * [[
>>>> https://github.com/julianhyde/optiq-csv/blob/master/TUTORIAL.md|Tutorial]]
>>>> * [[https://github.com/julianhyde/optiq/blob/master/HOWTO.md|HOWTO]]
>>>> * [[https://github.com/julianhyde/optiq/blob/master/REFERENCE.md|Referenceguide]]
>>>>
>>>> === Presentation: ===
>>>>
>>>> *[[
>>>> https://github.com/julianhyde/share/blob/master/slides/optiq-richrelevance-2013.pdf?raw=true|
>>>> SQL on Big Data using Optiq]]
>>>> == Initial Source ==
>>>>
>>>> The initial code codebase resides in three projects, all hosted on github:
>>>>
>>>> * https://github.com/julianhyde/optiq
>>>> * https://github.com/julianhyde/optiq-csv
>>>> * https://github.com/julianhyde/linq4j
>>>>
>>>> === Source and Intellectual Property Submission Plan ===
>>>>
>>>> The initial codebase is already distributed under the Apache 2.0 License.
>>>> The owners of the IP have indicated willingness to sign the SGA.
>>>>
>>>> === External Dependencies ===
>>>>
>>>> Optiq and Linq4j have the following external dependencies.
>>>>
>>>> * Java 1.6, 1.7 or 1.8
>>>> * Apache Maven, Commons
>>>> * JavaCC (BSD license)
>>>> * Sqlline 1.1.6 (BSD license)
>>>> * Junit 4.11 (EPL)
>>>> * Janino (BSD license)
>>>> * Guava (Apache 2.0 license)
>>>> * Eigenbase-resgen, eigenbase-xom, eigenbase-properties (Apache 2.0
>>>> license)
>>>>
>>>> Some of Optiq's adapters (optiq-csv, optiq-mongodb, optiq-spark,
>>>> optiq-splunk) are currently developed alongside core Optiq, and have the
>>>> following additional dependencies:
>>>>
>>>> * Open CSV 2.3 (Apache 2.0 license)
>>>> * Apache Incubator Spark
>>>> * Mongo Java driver (Apache 2.0 license)
>>>> Upon acceptance to the incubator, we would begin a thorough analysis of all
>>>> transitive dependencies to verify this information and introduce license
>>>> checking into the build and release process by integrating with Apache Rat.
>>>>
>>>> === Cryptography ===
>>>>
>>>> Optiq will eventually support encryption on the wire. This is not one of
>>>> the initial goals, and we do not expect Optiq to be a controlled export
>>>> item due to the use of encryption.
>>>>
>>>> == Required Resources ==
>>>>
>>>> === Mailing Lists ===
>>>>
>>>> * private@optiq.incubator.apache.org
>>>> * dev@optiq.incubator.apache.org (will be migrated from
>>>> optiq-dev@googlegroups.com)
>>>> * commits@optiq.incubator.apache.org
>>>>
>>>> === Source control ===
>>>>
>>>> The Optiq team would like to use git for source control, due to our current
>>>> use of git/github. We request a writeable git repo git://
>>>> git.apache.org/incubator-optiq, and mirroring to be set up to github
>>>> through INFRA.
>>>>
>>>> === Issue Tracking ===
>>>>
>>>> Optiq currently uses the github issue tracking system associated with its
>>>> github repo: https://github.com/julianhyde/optiq/issues. We will migrate
to
>>>> the Apache JIRA: http://issues.apache.org/jira/browse/OPTIQ.
>>>>
>>>> == Initial Committers ==
>>>>
>>>> * Julian Hyde (jhyde at apache dot org)
>>>> * Jacques Nadeau (jacques at apache dot org)
>>>> * James R. Taylor (jamestaylor at apache dot org)
>>>> * Chris Wensel (cwensel at apache dot org)
>>>>
>>>> === Affiliations ===
>>>>
>>>> The initial committers are employees of Concurrent, Hortonworks, MapR and
>>>> Salesforce.com.
>>>>
>>>> * Julian Hyde (Hortonworks)
>>>> * Jacques Nadeau (MapR Technologies)
>>>> * James R. Taylor (Salesforce.com)
>>>> * Chris Wensel (Concurrent)
>>>>
>>>> == Sponsors ==
>>>>
>>>> === Champion ===
>>>>
>>>> * Ashutosh Chauhan (hashutosh at apache dot org)
>>>>
>>>> === Nominated Mentors ===
>>>>
>>>> * Ted Dunning (tdunning at apache dot org) - Chief Application Architect
>>>> at MapR Technologies; committer for Lucene, Mahout and ZooKeeper.
>>>> * Alan Gates (gates at apache dot org) - Architect at Hortonworks;
>>>> committer for Pig, Hive and others.
>>>> * Steven Noels (stevenn at apache dot org) - Chief Technical Officer at
>>>> NGDATA; committer for Cocoon and Forrest, mentor for Phoenix.
>>>>
>>>> === Sponsoring Entity ===
>>>>
>>>> The Apache Incubator.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message