incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: [VOTE] Phoenix for incubator project
Date Mon, 09 Dec 2013 21:07:18 GMT

+1 for Phoenix. 






On 12/5/13 4:43 PM, "Stack" <stack@duboce.net> wrote:

>Discussion of the Phoenix proposal has settled since its original
>posting on November 7th.  Feedback has been incorporated.
>
>Let us now move to a vote.
>
>Should Phoenix become an Apache incubator project?
>
>[] +1 Accept Phoenix into the Incubator
>[] +0 Don't care whether or which
>[] -1 Do not accept Phoenix into the Incubator because...
>
>The latest version of the proposal can be found here [1].  It is
>also posted below for your convenience.
>
>Let the vote run 72 hours.
>
>Thank you,
>St.Ack
>
>1. https://wiki.apache.org/incubator/PhoenixProposal
>
>
>
>
>Abstract
>
>Phoenix is an open source SQL query engine for Apache HBase, a NoSQL data
>store. It is accessed as a JDBC driver and enables querying and managing
>HBase tables using SQL.
>
>Proposal
>
>Phoenix is an open source SQL skin over HBase delivered as a
>client-embedded JDBC driver targeting low latency queries over HBase data.
>Phoenix takes your SQL query, compiles it into a series of HBase scans,
>and
>orchestrates the running of those scans to produce regular JDBC result
>sets. The table metadata is stored in an HBase table and versioned, such
>that snapshot queries over prior versions will automatically use the
>correct schema. Direct use of the HBase API, along with coprocessors and
>custom filters, results in performance on the order of milliseconds for
>small queries, or seconds for tens of millions of rows. Phoenix interfaces
>with both Pig and Map-reduce for the input and output of data.
>
>Background
>
>Phoenix initially started as an internal project at Salesforce.com to
>efficiently analyze big data stored in HBase. It was open sourced on
>Github
>about a year ago in Jan 2013. Over time Phoenix, together with HBase as
>the
>storage tier, has begun to evolve into a general SQL database with support
>for metadata management, secondary indexes, joins, query optimization, and
>multi-tenancy. This is expected to continue as Phoenix implements a
>cost-based query optimizer and potentially transaction support, and
>surfaces new HBase security features such as encryption and cell-level
>security. Phoenix's developer community has also grown to include
>additional companies such as Intel, who have contributed join support to
>Phoenix, as well as Hortonworks, who are in the process of porting Phoenix
>to the 0.96 release of HBase.
>
>Rationale
>
>As usage and the number of contributors to Phoenix has grown, we have
>sought for a long-term home for the project, and we believe the Apache
>foundation would be a great fit. Joining Apache would ensure that tried
>and
>true processes and procedures are in place for the growing number of
>organizations interested in contributing to Phoenix. Phoenix is also a
>good
>fit for the Apache foundation: Phoenix already interoperates with several
>existing Apache projects (HBase, Hadoop, Pig, BigTop). The Phoenix team is
>familiar with the Apache process and and believes in the Apache mission -
>the team already includes multiple Apache committers.
>
>Initial Goals
>
>The initial goals will be to move the existing codebase to Apache and
>integrate with the Apache development process. Once this is accomplished,
>we plan for incremental development and releases that follow the Apache
>guidelines.
>
>Current Status
>
>Phoenix has undergone two major and three minor releases (1.0, 1.1, 1.2,
>2.0, and 2.1) as well as many patch releases. Phoenix is being used in
>production by Salesforce.com as well as at other organizations. The
>Phoenix
>codebase is currently hosted at github.com, which will form the basis of
>the Apache git repository.
>
>Meritocracy
>
>The Phoenix project already operates on meritocratic principles. Phoenix
>has several developers from various organizations outside of
>Salesforce.com
>who have contributed major new features. While this process has remained
>mostly informal, as we do not have an official committer list, an implicit
>organization exists in which individuals who contribute major components
>act as maintainers for those modules. If accepted, the Phoenix project
>would include several of these participants as initial committers. We will
>work to identify all committers and PPMC members for the project and to
>operate under the ASF meritocratic principles.
>
>Community
>
>Acceptance into the Apache foundation would bolster the already strong
>user
>and developer community around Phoenix. That community includes many
>contributors from various other companies, and an active mailing list
>composed of hundreds of users.
>
>Core Developers
>
>The core developers of our project are listed in our contributors and
>initial PPMC below. Though many are employed at Salesforce.com, there is a
>representative cross sampling of other organizations including Intel,
>Hortonworks, and Cloudera.
>
>Alignment
>
>Our proposed Phoenix effort aligns closely with Apache HBase. The HBase
>project perimeter is denoted by a simple byte-array based Create, Read,
>Update, Delete and Scan APIs with no current plans to extend beyond this
>bounds. Phoenix complements this with a higher level API in SQL with which
>many are already familiar. At first glance, it may seem that Phoenix
>should
>just be folded into HBase as a new module. However, the focus of the two
>projects will be quite different, especially as Phoenix matures. With
>secondary indexing and joins just having been introduced into Phoenix, the
>next big frontier will be to implement a cost-based query optimizer. This
>is the heart-and-soul of most relational databases and can can take a
>lifetime to get right.
>
>HBase is focused on being a scalable data store agnostic to types and
>schema. Phoenix would layer typing, and relational facilities on top of
>this scalable store. By keeping Apache HBase and Phoenix separate, both
>may
>evolve independently and at different rates. Though the focus of the two
>projects is different, the relationship between them is very positive and
>mutually beneficial. New features in HBase will be leveraged in Phoenix as
>it makes sense to surface these in a SQL paradigm. In addition, Phoenix
>may
>drive new features in HBase, as evidenced by the new type system recently
>introduced into HBase. This will enable better interoperability between
>Apache Hive, standalone HBase uses case, and Phoenix by defining a
>standard
>serialization format.
>
>Phoenix can be divided into a front end and a back end. The front end is
>delivered as a JDBC driver and contains, among other things, the SQL
>parser
>and query planner. The front end is currently written for the HBase client
>API but could be extended to support other data stores in the Apache
>family.
>
>The back end is, currently, HBase specific components for pushing as much
>work to the server as possible. However, if there were sufficient interest
>to build them, contributions to Phoenix of new back ends for other data
>stores in the Apache family would be feasible.
>
>Other projects exists that perform SQL over HBase data (such as Apache
>Hive), however these products do not provide the same low latency query
>capabilities as Phoenix. Instead, they are more oriented around maximizing
>throughput for batched operations. Phoenix opens the door to a completely
>new set of use cases for Apache HBase that demand a more interactive user
>experience.
>
>There are also a number of related Apache projects and dependencies that
>are mentioned in the Relationships with Other Apache products section.
>
>Known Risks
>
>Orphaned Products
>
>Given the current level of investment in Phoenix - the risk of the project
>being abandoned is minimal. All current and planned HBase use cases at
>Salesforce.com go through Phoenix. In addition, both Intel and Hortonworks
>plan to include Phoenix in their distributions. Other companies have
>devoted significant internal infrastructure investment in Phoenix.
>
>Inexperience with Open Source
>
>Phoenix has existed as a healthy open source project for almost a year.
>During that time, James, Mujtaba, and others have successfully fostered an
>open-source community, attracting users and developers from a diverse
>group
>of companies including Intel, Intuit, Bloomberg, Tagged, and Hortonworks.
>Although neither are committers on other Apache projects, both James and
>Mujtaba have experience working with and contributing to other Apache
>projects.
>
>Homogenous Developers
>
>The initial list of committers includes developers from several
>institutions, including Salesforce, Intel, and Hortonworks.
>
>Reliance on Salaried Developers
>
>Like most open source projects, Phoenix receives substantial support from
>salaried developers. A large fraction of Phoenix development is supported
>by Salesforce.com. In addition, those working from within corporations and
>universities often devote ┬│after hours┬▓ or spare time to the project. We
>will continue our efforts to ensure stewardship of the project to be
>independent of salaried developers.
>
>Relationship with Other Apache Products
>
>Although Phoenix provides a higher level abstraction than Apache HBase by
>hiding its client APIs, Phoenix relies on Apache HBase for both storing
>and
>retrieving data. It also inter-operates with Apache HBase by allowing
>existing data, not created by Phoenix, to be queried. In addition, both
>Apache Pig and Hadoop are supported for data input and output. Finally,
>the
>Phoenix is included and installable through Apache Bigtop and the build
>and
>test suite are run through Apache Maven.
>
>Phoenix offers an alternative query engine to Apache Hadoop (MapReduce).
>Unlike MapReduce, Phoenix is designed for lower-latency, OLTP, and
>interactive workloads. This makes the projects complimentary as users may
>run MapReduce and Phoenix side-by-side.
>
>We plan to increase the interoperability between Phoenix, Apache Hive, and
>standalone Apache HBase usage by standardizing on a new type system that
>has been introduced in the current major release of HBase. By all these
>products adopting this new serialization format, interoperability between
>them will take a big step forward.
>
>In addition, we plan to explore providing lower level APIs for other
>products such as Apache Drill to plug into when querying HBase data so
>that
>they get the performance benefits of using Phoenix.
>
>A Excessive Fascination with the Apache Brand
>
>Phoenix is already a healthy and relatively well known open source
>project.
>This proposal is not for the purpose of generating publicity. Rather, the
>primary benefits to joining Apache are those outlined in the Rationale
>section.
>
>Documentation
>
>Additional documentation on Phoenix may be found on its github website:
>
>Phoenix overview:
>https://github.com/forcedotcom/phoenix/blob/master/README.md
>
>Phoenix wiki: https://github.com/forcedotcom/phoenix/wiki
>
>Phoenix road map: https://github.com/forcedotcom/phoenix/wiki#roadmap
>
>Phoenix issue tracking:
>https://github.com/forcedotcom/phoenix/issues?direction=desc&sort=updated&
>state=open
>
>Phoenix codebase: https://github.com/forcedotcom/phoenix
>
>Phoenix SQL language reference: http://forcedotcom.github.io/phoenix/
>
>Phoenix performance:
>https://github.com/forcedotcom/phoenix/wiki/Performance#phoenix-vs-related
>-products
>
>User group: https://groups.google.com/group/phoenix-hbase-user
>
>Initial Source
>
>The Phoenix codebase is currently hosted on Github:
>https://github.com/forcedotcom/phoenix.
>
>Source and Intellectual Property Submission Plan
>
>Currently, the Phoenix codebase is distributed under a BSD license. Upon
>entering Apache, the Phoenix license will be migrated to the Apache 2.0
>License.
>
>External Dependencies
>
>Beyond relying on Apache HBase, Phoenix has the following external
>dependencies:
>
>ANTLR 3.5 (BSD license: http://www.antlr3.org/license.html)
>
>Sqlline 1.1.2 (BSD license:
>https://github.com/julianhyde/sqlline/blob/master/LICENSE)
>
>Open CSV 2.3 (Apache 2.0 license)
>
>Upon acceptance to the incubator, we would begin a thorough analysis of
>all
>transitive dependencies to verify this information and introduce license
>checking into the build and release process by integrating with Apache
>Rat.
>
>Required Resources
>
>Mailing list
>
>We will migrate the existing Phoenix mailing lists as follows:
>
>phoenix-hbase-user@googlegroups.com --> users@phoenix.incubator.apache.org
>
>phoenix-hbase-dev@googlegroups.com --> dev@phoenix.incubator.apache.org
>
>private@phoenix.incubator.apache.org for IPMC members
>
>commits@phoenix.incubator.apache.org
>
>The latter is to be consistent with the new PIAO naming scheme for
>podlings.
>
>Source control
>
>The Phoenix team would like to use Git for source control, due to our
>current use of Git. We request a writeable Git repo for Phoenix, and
>mirroring to be set up to Github through INFRA.
>
>Issue Tracking
>
>Phoenix currently uses the github issue tracking system associated with
>its
>github repo:
>https://github.com/forcedotcom/phoenix/issues?direction=desc&sort=updated&
>state=open.
>We will migrate to the Apache JIRA:
>http://issues.apache.org/jira/browse/PHOENIX
>
>Other Resources
>
>Jenkins/Hudson for builds and test running.
>Wiki for documentation purposes
>Blog to improve project dissemination
>
>Initial Committers
>
>James Taylor <jtaylor at salesforce dot com>
>
>Mujtaba Chohan <mchohan at salesforce dot com>
>
>Jesse Yates <jyates at apache dot org>
>
>Eli Levine <elevine at salesforce dot com>
>
>Simon Toens <stoens at salesforce dot com>
>
>Maryann Xue <wei.xue at intel dot com>
>
>Anoop Sam John <anoopsamjohn at apache dot org>
>
>Ramkrishna S Vasudevan <ramkrishna at apache dot org>
>
>Jeffrey Zhong <jeffreyz at apache dot org>
>
>Nick Dimiduk <ndimiduk at apache dot org>
>
>Affiliations
>
>The initial committers are from three organizations: Salesforce.com,
>Intel,
>and Hortonworks.
>
>James Taylor (Salesforce.com)
>Mujtaba Chohan (Salesforce.com)
>Jesse Yates (Salesforce.com)
>Eli Levine (Salesforce.com)
>Simon Toens (Salesforce.com)
>Maryann Xue (Intel)
>Anoop Sam John (Intel)
>Ramkrishna S Vasudevan (Intel)
>Jeffrey Zhong (Hortonworks)
>Nick Dimiduk (Hortonworks)
>
>Sponsors
>
>Champion
>
>Michael Stack
>
>Nominated Mentors
>
>Michael Stack
>Lars Hofhansl
>Andrew Purtell
>Devaraj Das
>Enis Soztutar
>Steven Noels
>
>Sponsoring Entity
>
>The Apache Incubator


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message