incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: [VOTE] Phoenix for incubator project
Date Thu, 05 Dec 2013 22:31:44 GMT
+1 (binding)




________________________________
 From: Stack <stack@duboce.net>
To: general@incubator.apache.org 
Sent: Thursday, December 5, 2013 1:43 PM
Subject: [VOTE] Phoenix for incubator project
 

Discussion of the Phoenix proposal has settled since its original
posting on November 7th.  Feedback has been incorporated.

Let us now move to a vote.

Should Phoenix become an Apache incubator project?

[] +1 Accept Phoenix into the Incubator
[] +0 Don't care whether or which
[] -1 Do not accept Phoenix into the Incubator because...

The latest version of the proposal can be found here [1].  It is
also posted below for your convenience.

Let the vote run 72 hours.

Thank you,
St.Ack

1. https://wiki.apache.org/incubator/PhoenixProposal




Abstract

Phoenix is an open source SQL query engine for Apache HBase, a NoSQL data
store. It is accessed as a JDBC driver and enables querying and managing
HBase tables using SQL.

Proposal

Phoenix is an open source SQL skin over HBase delivered as a
client-embedded JDBC driver targeting low latency queries over HBase data.
Phoenix takes your SQL query, compiles it into a series of HBase scans, and
orchestrates the running of those scans to produce regular JDBC result
sets. The table metadata is stored in an HBase table and versioned, such
that snapshot queries over prior versions will automatically use the
correct schema. Direct use of the HBase API, along with coprocessors and
custom filters, results in performance on the order of milliseconds for
small queries, or seconds for tens of millions of rows. Phoenix interfaces
with both Pig and Map-reduce for the input and output of data.

Background

Phoenix initially started as an internal project at Salesforce.com to
efficiently analyze big data stored in HBase. It was open sourced on Github
about a year ago in Jan 2013. Over time Phoenix, together with HBase as the
storage tier, has begun to evolve into a general SQL database with support
for metadata management, secondary indexes, joins, query optimization, and
multi-tenancy. This is expected to continue as Phoenix implements a
cost-based query optimizer and potentially transaction support, and
surfaces new HBase security features such as encryption and cell-level
security. Phoenix's developer community has also grown to include
additional companies such as Intel, who have contributed join support to
Phoenix, as well as Hortonworks, who are in the process of porting Phoenix
to the 0.96 release of HBase.

Rationale

As usage and the number of contributors to Phoenix has grown, we have
sought for a long-term home for the project, and we believe the Apache
foundation would be a great fit. Joining Apache would ensure that tried and
true processes and procedures are in place for the growing number of
organizations interested in contributing to Phoenix. Phoenix is also a good
fit for the Apache foundation: Phoenix already interoperates with several
existing Apache projects (HBase, Hadoop, Pig, BigTop). The Phoenix team is
familiar with the Apache process and and believes in the Apache mission -
the team already includes multiple Apache committers.

Initial Goals

The initial goals will be to move the existing codebase to Apache and
integrate with the Apache development process. Once this is accomplished,
we plan for incremental development and releases that follow the Apache
guidelines.

Current Status

Phoenix has undergone two major and three minor releases (1.0, 1.1, 1.2,
2.0, and 2.1) as well as many patch releases. Phoenix is being used in
production by Salesforce.com as well as at other organizations. The Phoenix
codebase is currently hosted at github.com, which will form the basis of
the Apache git repository.

Meritocracy

The Phoenix project already operates on meritocratic principles. Phoenix
has several developers from various organizations outside of Salesforce.com
who have contributed major new features. While this process has remained
mostly informal, as we do not have an official committer list, an implicit
organization exists in which individuals who contribute major components
act as maintainers for those modules. If accepted, the Phoenix project
would include several of these participants as initial committers. We will
work to identify all committers and PPMC members for the project and to
operate under the ASF meritocratic principles.

Community

Acceptance into the Apache foundation would bolster the already strong user
and developer community around Phoenix. That community includes many
contributors from various other companies, and an active mailing list
composed of hundreds of users.

Core Developers

The core developers of our project are listed in our contributors and
initial PPMC below. Though many are employed at Salesforce.com, there is a
representative cross sampling of other organizations including Intel,
Hortonworks, and Cloudera.

Alignment

Our proposed Phoenix effort aligns closely with Apache HBase. The HBase
project perimeter is denoted by a simple byte-array based Create, Read,
Update, Delete and Scan APIs with no current plans to extend beyond this
bounds. Phoenix complements this with a higher level API in SQL with which
many are already familiar. At first glance, it may seem that Phoenix should
just be folded into HBase as a new module. However, the focus of the two
projects will be quite different, especially as Phoenix matures. With
secondary indexing and joins just having been introduced into Phoenix, the
next big frontier will be to implement a cost-based query optimizer. This
is the heart-and-soul of most relational databases and can can take a
lifetime to get right.

HBase is focused on being a scalable data store agnostic to types and
schema. Phoenix would layer typing, and relational facilities on top of
this scalable store. By keeping Apache HBase and Phoenix separate, both may
evolve independently and at different rates. Though the focus of the two
projects is different, the relationship between them is very positive and
mutually beneficial. New features in HBase will be leveraged in Phoenix as
it makes sense to surface these in a SQL paradigm. In addition, Phoenix may
drive new features in HBase, as evidenced by the new type system recently
introduced into HBase. This will enable better interoperability between
Apache Hive, standalone HBase uses case, and Phoenix by defining a standard
serialization format.

Phoenix can be divided into a front end and a back end. The front end is
delivered as a JDBC driver and contains, among other things, the SQL parser
and query planner. The front end is currently written for the HBase client
API but could be extended to support other data stores in the Apache family.

The back end is, currently, HBase specific components for pushing as much
work to the server as possible. However, if there were sufficient interest
to build them, contributions to Phoenix of new back ends for other data
stores in the Apache family would be feasible.

Other projects exists that perform SQL over HBase data (such as Apache
Hive), however these products do not provide the same low latency query
capabilities as Phoenix. Instead, they are more oriented around maximizing
throughput for batched operations. Phoenix opens the door to a completely
new set of use cases for Apache HBase that demand a more interactive user
experience.

There are also a number of related Apache projects and dependencies that
are mentioned in the Relationships with Other Apache products section.

Known Risks

Orphaned Products

Given the current level of investment in Phoenix - the risk of the project
being abandoned is minimal. All current and planned HBase use cases at
Salesforce.com go through Phoenix. In addition, both Intel and Hortonworks
plan to include Phoenix in their distributions. Other companies have
devoted significant internal infrastructure investment in Phoenix.

Inexperience with Open Source

Phoenix has existed as a healthy open source project for almost a year.
During that time, James, Mujtaba, and others have successfully fostered an
open-source community, attracting users and developers from a diverse group
of companies including Intel, Intuit, Bloomberg, Tagged, and Hortonworks.
Although neither are committers on other Apache projects, both James and
Mujtaba have experience working with and contributing to other Apache
projects.

Homogenous Developers

The initial list of committers includes developers from several
institutions, including Salesforce, Intel, and Hortonworks.

Reliance on Salaried Developers

Like most open source projects, Phoenix receives substantial support from
salaried developers. A large fraction of Phoenix development is supported
by Salesforce.com. In addition, those working from within corporations and
universities often devote “after hours” or spare time to the project. We
will continue our efforts to ensure stewardship of the project to be
independent of salaried developers.

Relationship with Other Apache Products

Although Phoenix provides a higher level abstraction than Apache HBase by
hiding its client APIs, Phoenix relies on Apache HBase for both storing and
retrieving data. It also inter-operates with Apache HBase by allowing
existing data, not created by Phoenix, to be queried. In addition, both
Apache Pig and Hadoop are supported for data input and output. Finally, the
Phoenix is included and installable through Apache Bigtop and the build and
test suite are run through Apache Maven.

Phoenix offers an alternative query engine to Apache Hadoop (MapReduce).
Unlike MapReduce, Phoenix is designed for lower-latency, OLTP, and
interactive workloads. This makes the projects complimentary as users may
run MapReduce and Phoenix side-by-side.

We plan to increase the interoperability between Phoenix, Apache Hive, and
standalone Apache HBase usage by standardizing on a new type system that
has been introduced in the current major release of HBase. By all these
products adopting this new serialization format, interoperability between
them will take a big step forward.

In addition, we plan to explore providing lower level APIs for other
products such as Apache Drill to plug into when querying HBase data so that
they get the performance benefits of using Phoenix.

A Excessive Fascination with the Apache Brand

Phoenix is already a healthy and relatively well known open source project.
This proposal is not for the purpose of generating publicity. Rather, the
primary benefits to joining Apache are those outlined in the Rationale
section.

Documentation

Additional documentation on Phoenix may be found on its github website:

Phoenix overview:
https://github.com/forcedotcom/phoenix/blob/master/README.md

Phoenix wiki: https://github.com/forcedotcom/phoenix/wiki

Phoenix road map: https://github.com/forcedotcom/phoenix/wiki#roadmap

Phoenix issue tracking:
https://github.com/forcedotcom/phoenix/issues?direction=desc&sort=updated&state=open

Phoenix codebase: https://github.com/forcedotcom/phoenix

Phoenix SQL language reference: http://forcedotcom.github.io/phoenix/

Phoenix performance:
https://github.com/forcedotcom/phoenix/wiki/Performance#phoenix-vs-related-products

User group: https://groups.google.com/group/phoenix-hbase-user

Initial Source

The Phoenix codebase is currently hosted on Github:
https://github.com/forcedotcom/phoenix.

Source and Intellectual Property Submission Plan

Currently, the Phoenix codebase is distributed under a BSD license. Upon
entering Apache, the Phoenix license will be migrated to the Apache 2.0
License.

External Dependencies

Beyond relying on Apache HBase, Phoenix has the following external
dependencies:

ANTLR 3.5 (BSD license: http://www.antlr3.org/license.html)

Sqlline 1.1.2 (BSD license:
https://github.com/julianhyde/sqlline/blob/master/LICENSE)

Open CSV 2.3 (Apache 2.0 license)

Upon acceptance to the incubator, we would begin a thorough analysis of all
transitive dependencies to verify this information and introduce license
checking into the build and release process by integrating with Apache Rat.

Required Resources

Mailing list

We will migrate the existing Phoenix mailing lists as follows:

phoenix-hbase-user@googlegroups.com --> users@phoenix.incubator.apache.org

phoenix-hbase-dev@googlegroups.com --> dev@phoenix.incubator.apache.org

private@phoenix.incubator.apache.org for IPMC members

commits@phoenix.incubator.apache.org

The latter is to be consistent with the new PIAO naming scheme for podlings.

Source control

The Phoenix team would like to use Git for source control, due to our
current use of Git. We request a writeable Git repo for Phoenix, and
mirroring to be set up to Github through INFRA.

Issue Tracking

Phoenix currently uses the github issue tracking system associated with its
github repo:
https://github.com/forcedotcom/phoenix/issues?direction=desc&sort=updated&state=open.
We will migrate to the Apache JIRA:
http://issues.apache.org/jira/browse/PHOENIX

Other Resources

Jenkins/Hudson for builds and test running.
Wiki for documentation purposes
Blog to improve project dissemination

Initial Committers

James Taylor <jtaylor at salesforce dot com>

Mujtaba Chohan <mchohan at salesforce dot com>

Jesse Yates <jyates at apache dot org>

Eli Levine <elevine at salesforce dot com>

Simon Toens <stoens at salesforce dot com>

Maryann Xue <wei.xue at intel dot com>

Anoop Sam John <anoopsamjohn at apache dot org>

Ramkrishna S Vasudevan <ramkrishna at apache dot org>

Jeffrey Zhong <jeffreyz at apache dot org>

Nick Dimiduk <ndimiduk at apache dot org>

Affiliations

The initial committers are from three organizations: Salesforce.com, Intel,
and Hortonworks.

James Taylor (Salesforce.com)
Mujtaba Chohan (Salesforce.com)
Jesse Yates (Salesforce.com)
Eli Levine (Salesforce.com)
Simon Toens (Salesforce.com)
Maryann Xue (Intel)
Anoop Sam John (Intel)
Ramkrishna S Vasudevan (Intel)
Jeffrey Zhong (Hortonworks)
Nick Dimiduk (Hortonworks)

Sponsors

Champion

Michael Stack

Nominated Mentors

Michael Stack
Lars Hofhansl
Andrew Purtell
Devaraj Das
Enis Soztutar
Steven Noels

Sponsoring Entity

The Apache Incubator
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message