incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: [VOTE] Accept Tephra into the Apache Incubator
Date Fri, 04 Mar 2016 21:54:01 GMT
+1 (binding)

> On Mar 4, 2016, at 12:57 PM, Alan Gates <alanfgates@gmail.com> wrote:
> 
> +1 (binding).
> 
> Alan.
> 
>> On Mar 3, 2016, at 17:29, Poorna Chandra <poorna@apache.org> wrote:
>> 
>> Hi All,
>> 
>> Tephra proposal was sent out for discussion last week. The proposal is
>> available at https://wiki.apache.org/incubator/TephraProposal
>> 
>> Please vote to accept Tephra into the Apache Incubator. The vote will be
>> open for the next 72 hours.
>> 
>> [ ] +1 Accept Tephra as an Apache Incubator podling.
>> [ ] +0 Abstain.
>> [ ] -1 Don’t accept Tephra as an Apache Incubator podling because ...
>> 
>> Thanks,
>> Poorna.
>> 
>> ------
>> 
>> = Abstract =
>> 
>> Tephra is a system for providing globally consistent transactions on
>> top of Apache HBase and other storage engines.
>> 
>> = Proposal =
>> 
>> Tephra is a transaction engine for distributed data stores like Apache HBase.
>> It provides ACID semantics for concurrent data operations that span over region
>> boundaries in HBase using Optimistic Concurrency Control.
>> 
>> = Background =
>> 
>> HBase provides strong consistency with row- or region-level ACID
>> operations. However, it sacrifices cross-region and cross-table
>> consistency in favor of scalability. This trade-off requires application
>> developers to handle  the complexity of ensuring consistency when their
>> modifications span region boundaries. By providing support for global
>> transactions that span regions, tables, or multiple RPCs,
>> Tephra simplifies application development on top of HBase, without a
>> significant impact on performance or scalability for many workloads.
>> 
>> Tephra leverages HBase’s native data versioning to provide multi-versioned
>> concurrency control (MVCC) for transactional reads and writes.
>> With MVCC capability, each transaction sees its own consistent “snapshot” of
>> data, providing snapshot isolation of concurrent transactions.
>> MVCC along with conflict detection and handling enables Optimistic Concurrency
>> Control.
>> 
>> Tephra consists of three main components:
>> * Transaction Server – maintains global view of transaction state, assigns
>>  new transaction IDs and performs conflict detection;
>> * Transaction Client – coordinates start, commit, and rollback of
>> transactions; and
>> * Transaction Processor Coprocessor – applies filtering to the data read (based
>>  on a given transaction’s state) and cleans up any data from old
>>  (no longer visible) transactions.
>> 
>> Although Tephra only supports HBase now, it can be extended to support
>> transactions on any store that has multi-versioning and rollback
>> support. The transactions
>> can span over multiple stores and storage paradigms.
>> 
>> = Rationale =
>> 
>> Tephra has simple abstractions which can be used by an application to
>> add transaction support over HBase. By abstracting away transaction
>> handling using Tephra, the application is freed of
>> transaction logic, and the application developer can focus on the use case.
>> Also, Tephra can be extended to support transactions on data sources other
>> than HBase.
>> 
>> By making Tephra an Apache open source project, we believe that there will
>> be wider adoption and more opportunities for Tephra to be integrated
>> into other Apache projects.
>> 
>> = Current Status =
>> 
>> Tephra was built at Cask Data Inc. initially as part of
>> open-source framework Cask Data Application Platform (CDAP)
>> [[http://cdap.io/]].
>> It was later converted into an independent open source project with
>> Apache 2.0 License [[https://github.com/caskdata/tephra]].
>> 
>> Tephra is used in CDAP as the transaction engine. As part of CDAP, Tephra
>> has been deployed at multiple companies.
>> 
>> Apache Phoenix is using Tephra as transaction engine in the next release.
>> 
>> == Meritocracy ==
>> 
>> Our intent with this incubator proposal is to start building a diverse
>> developer community around Tephra following the Apache meritocracy model.
>> Since Tephra was initially developed in early 2013, we have had fast
>> adoption and contributions within Cask Data. We are looking forward to
>> new contributors. We wish to build a community based on Apache's
>> meritocracy principles, working with those who contribute significantly to
>> the project and welcoming them to be committers both during the incubation
>> process and beyond.
>> 
>> == Community ==
>> 
>> Core developers of Tephra are at Cask Data. Recently the developer community
>> has expanded to include folks from Apache Phoenix. We hope to extend our
>> contributor base significantly and we will invite all who are interested
>> in working on distributed transaction engine.
>> 
>> == Core Developers ==
>> 
>> A few engineers from Cask Data and outside have developed Tephra:
>> Andreas Neumann, Terence Yim, Gary Helmling, Andrew Purtell and
>> Poorna Chandra.
>> 
>> 
>> == Alignment ==
>> 
>> The ASF is the natural choice to host the Tephra project as its goal of
>> encouraging community-driven open source projects fits with our vision for
>> Tephra.
>> 
>> Additionally, many other projects with which we are familiar and expect
>> Tephra to integrate with, such as Phoenix, Zookeeper, HDFS, log4j, and others
>> mentioned in the External Dependencies section are Apache projects, and
>> Tephra will benefit by close proximity to them.
>> 
>> = Known Risks =
>> 
>> == Orphaned Products ==
>> 
>> There is very little risk of Tephra being orphaned, as it is a key part of
>> Cask Data’s products. The core Tephra developers plan to continue to work
>> on Tephra, and Cask Data has funding in place to support their efforts
>> going forward.
>> Also with Phoenix using Tephra for transactions, Phoenix developers are
>> keen on contributing to Tephra.
>> 
>> 
>> == Inexperience with Open Source ==
>> 
>> Several of the core developers have experience with open source
>> development. Andreas Neumann is an Apache committer for Oozie and Twill.
>> Terence Yim is an Apache committer for Helix and Twill. Poorna Chandra
>> is an Apache committer for Twill. Gary Helmling is a committer for
>> Apache Twill and a committer and PMC member for Apache HBase.
>> James Taylor is PMC chair for Apache Phoenix, PMC member of Apache Calcite,
>> and an IPMC member.
>> 
>> == Homogeneous Developers ==
>> 
>> The current core developers are all Cask Data employees. However, we
>> intend to establish a developer community that includes independent and
>> corporate contributors. We are encouraging new contributors via our mailing
>> lists, public presentations, and personal contacts, and we will continue to
>> do so.
>> 
>> Apache Phoenix developers have already contributed several patches to Tephra,
>> and have expressed interest in becoming long term contributors.
>> 
>> == Reliance on Salaried Developers ==
>> 
>> Currently, these developers are paid to work on Tephra. Once the project has
>> built a community, we expect to attract committers, developers and community
>> other than the current core developers. However, because Cask Data
>> products use Tephra internally, the reliance on salaried developers is
>> unlikely to change, at least in the near term.
>> 
>> == Relationships with Other Apache Products ==
>> 
>> Tephra is deeply integrated with Apache projects. Tephra provides transactions
>> over Apache HBase, and uses Apache Twill and Apache Zookeeper for coordination.
>> A number of other Apache projects are Tephra dependencies, and are
>> listed in the External Dependencies section.
>> 
>> In addition, Apache Phoenix is using Tephra as the transaction engine.
>> 
>> == An Excessive Fascination with the Apache Brand ==
>> 
>> While we respect the reputation of the Apache brand and have no doubt that
>> it will attract contributors and users, our interest is primarily to give
>> Tephra a solid home as an open source project following an established
>> development model. We have also given additional reasons in the Rationale
>> and Alignment sections.
>> 
>> = Documentation =
>> 
>> The current documentation for Tephra is at https://github.com/caskdata/tephra.
>> 
>> = Initial Source =
>> 
>> Tephra codebase is currently hosted at https://github.com/caskdata/tephra.
>> 
>> = Source and Intellectual Property Submission Plan =
>> 
>> Tephra codebase is currently licensed under Apache 2.0 license.
>> Cask Data owns the trademark for "Tephra". As part of the incubation process
>> Cask Data will transfer the trademark to Apache Foundation.
>> 
>> = External Dependencies =
>> 
>> The dependencies all have Apache-compatible licenses:
>> * dropwizard metrics (Apache 2.0)
>> * fastutil (Apache 2.0)
>> * gson (Apache 2.0)
>> * guava-libraries (Apache 2.0)
>> * guice (Apache 2.0)
>> * hadoop (Apache 2.0)
>> * hbase (Apache 2.0)
>> * hdfs (Apache 2.0)
>> * junit (EPL v1.0)
>> * logback (EPL v1.0 )
>> * slf4j (MIT)
>> * thrift (Apache 2.0)
>> * twill (Apache 2.0)
>> * zookeeper (Apache 2.0)
>> 
>> = Cryptography =
>> 
>> Tephra does not use cryptography itself, however it can run on secure Hadoop,
>> which uses Kerberos.
>> 
>> = Required Resources =
>> 
>> == Mailing Lists ==
>> 
>> * tephra-private for private PMC discussions (with moderated subscriptions)
>> * tephra-dev for technical discussions among contributors
>> * tephra-commits for notification about commits
>> 
>> == Subversion Directory ==
>> 
>> Git is the preferred source control system: git://git.apache.org/tephra
>> 
>> == Issue Tracking ==
>> 
>> JIRA Tephra (TEPHRA)
>> 
>> == Other Resources ==
>> 
>> The existing code already has unit tests, so we would like a Hudson
>> instance to run them whenever a new patch is submitted. This can be added
>> after project creation.
>> 
>> = Initial Committers =
>> 
>> * Andreas Neumann <anew at apache dot org>
>> * Terence Yim <chtyim at apache dot org>
>> * Poorna Chandra <poorna at apache dot org>
>> * Gokul Gunasekaran <gokul at cask dot co>
>> * James Taylor <jamestaylor at apache dot org>
>> * Thomas D'Silva <tdsilva at apache dot org>
>> * Gary Helmling <garyh at apache dot org>
>> 
>> = Affiliations =
>> 
>> * Andreas Neumann (Cask Data)
>> * Terence Yim (Cask Data)
>> * Poorna Chandra (Cask Data)
>> * Gokul Gunasekaran (Cask Data)
>> * James Taylor (Salesforce.com)
>> * Thomas D'Silva (Salesforce.com)
>> * Gary Helmling (Facebook)
>> 
>> = Sponsors =
>> 
>> == Champion ==
>> 
>> James Taylor <jamestaylor at apache dot org> (V.P., Apache Phoenix)
>> 
>> == Nominated Mentors ==
>> 
>> * James Taylor <jamestaylor at apache dot org>
>> * Lars Hofhansl <larsh at apache dot org>
>> * Andrew Purtell <apurtell at apache dot org>
>> * Alan Gates <gates at apache dot org>
>> * Henry Saputra <hsaputra at apache dot org>
>> 
>> == Sponsoring Entity ==
>> 
>> We are requesting that the Incubator sponsor this project.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message