incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vasudevan, Ramkrishna S" <ramkrishna.s.vasude...@intel.com>
Subject RE: [VOTE] Accept Tephra into the Apache Incubator
Date Fri, 04 Mar 2016 08:41:05 GMT
+1(non-binding)

Regards
Ram

-----Original Message-----
From: Andrew Purtell [mailto:andrew.purtell@gmail.com] 
Sent: Friday, March 4, 2016 11:55 AM
To: general@incubator.apache.org
Subject: Re: [VOTE] Accept Tephra into the Apache Incubator

+1 (binding)

> On Mar 3, 2016, at 5:29 PM, Poorna Chandra <poorna@apache.org> wrote:
> 
> Hi All,
> 
> Tephra proposal was sent out for discussion last week. The proposal is 
> available at https://wiki.apache.org/incubator/TephraProposal
> 
> Please vote to accept Tephra into the Apache Incubator. The vote will 
> be open for the next 72 hours.
> 
> [ ] +1 Accept Tephra as an Apache Incubator podling.
> [ ] +0 Abstain.
> [ ] -1 Don’t accept Tephra as an Apache Incubator podling because ...
> 
> Thanks,
> Poorna.
> 
> ------
> 
> = Abstract =
> 
> Tephra is a system for providing globally consistent transactions on 
> top of Apache HBase and other storage engines.
> 
> = Proposal =
> 
> Tephra is a transaction engine for distributed data stores like Apache HBase.
> It provides ACID semantics for concurrent data operations that span 
> over region boundaries in HBase using Optimistic Concurrency Control.
> 
> = Background =
> 
> HBase provides strong consistency with row- or region-level ACID 
> operations. However, it sacrifices cross-region and cross-table 
> consistency in favor of scalability. This trade-off requires 
> application developers to handle  the complexity of ensuring 
> consistency when their modifications span region boundaries. By 
> providing support for global transactions that span regions, tables, 
> or multiple RPCs, Tephra simplifies application development on top of 
> HBase, without a significant impact on performance or scalability for many workloads.
> 
> Tephra leverages HBase’s native data versioning to provide 
> multi-versioned concurrency control (MVCC) for transactional reads and writes.
> With MVCC capability, each transaction sees its own consistent 
> “snapshot” of data, providing snapshot isolation of concurrent transactions.
> MVCC along with conflict detection and handling enables Optimistic 
> Concurrency Control.
> 
> Tephra consists of three main components:
> * Transaction Server – maintains global view of transaction state, assigns
>   new transaction IDs and performs conflict detection;
> * Transaction Client – coordinates start, commit, and rollback of 
> transactions; and
> * Transaction Processor Coprocessor – applies filtering to the data read (based
>   on a given transaction’s state) and cleans up any data from old
>   (no longer visible) transactions.
> 
> Although Tephra only supports HBase now, it can be extended to support 
> transactions on any store that has multi-versioning and rollback 
> support. The transactions can span over multiple stores and storage 
> paradigms.
> 
> = Rationale =
> 
> Tephra has simple abstractions which can be used by an application to 
> add transaction support over HBase. By abstracting away transaction 
> handling using Tephra, the application is freed of transaction logic, 
> and the application developer can focus on the use case.
> Also, Tephra can be extended to support transactions on data sources 
> other than HBase.
> 
> By making Tephra an Apache open source project, we believe that there 
> will be wider adoption and more opportunities for Tephra to be 
> integrated into other Apache projects.
> 
> = Current Status =
> 
> Tephra was built at Cask Data Inc. initially as part of open-source 
> framework Cask Data Application Platform (CDAP) [[http://cdap.io/]].
> It was later converted into an independent open source project with 
> Apache 2.0 License [[https://github.com/caskdata/tephra]].
> 
> Tephra is used in CDAP as the transaction engine. As part of CDAP, 
> Tephra has been deployed at multiple companies.
> 
> Apache Phoenix is using Tephra as transaction engine in the next release.
> 
> == Meritocracy ==
> 
> Our intent with this incubator proposal is to start building a diverse 
> developer community around Tephra following the Apache meritocracy model.
> Since Tephra was initially developed in early 2013, we have had fast 
> adoption and contributions within Cask Data. We are looking forward to 
> new contributors. We wish to build a community based on Apache's 
> meritocracy principles, working with those who contribute 
> significantly to the project and welcoming them to be committers both 
> during the incubation process and beyond.
> 
> == Community ==
> 
> Core developers of Tephra are at Cask Data. Recently the developer 
> community has expanded to include folks from Apache Phoenix. We hope 
> to extend our contributor base significantly and we will invite all 
> who are interested in working on distributed transaction engine.
> 
> == Core Developers ==
> 
> A few engineers from Cask Data and outside have developed Tephra:
> Andreas Neumann, Terence Yim, Gary Helmling, Andrew Purtell and Poorna 
> Chandra.
> 
> 
> == Alignment ==
> 
> The ASF is the natural choice to host the Tephra project as its goal 
> of encouraging community-driven open source projects fits with our 
> vision for Tephra.
> 
> Additionally, many other projects with which we are familiar and 
> expect Tephra to integrate with, such as Phoenix, Zookeeper, HDFS, 
> log4j, and others mentioned in the External Dependencies section are 
> Apache projects, and Tephra will benefit by close proximity to them.
> 
> = Known Risks =
> 
> == Orphaned Products ==
> 
> There is very little risk of Tephra being orphaned, as it is a key 
> part of Cask Data’s products. The core Tephra developers plan to 
> continue to work on Tephra, and Cask Data has funding in place to 
> support their efforts going forward.
> Also with Phoenix using Tephra for transactions, Phoenix developers 
> are keen on contributing to Tephra.
> 
> 
> == Inexperience with Open Source ==
> 
> Several of the core developers have experience with open source 
> development. Andreas Neumann is an Apache committer for Oozie and Twill.
> Terence Yim is an Apache committer for Helix and Twill. Poorna Chandra 
> is an Apache committer for Twill. Gary Helmling is a committer for 
> Apache Twill and a committer and PMC member for Apache HBase.
> James Taylor is PMC chair for Apache Phoenix, PMC member of Apache 
> Calcite, and an IPMC member.
> 
> == Homogeneous Developers ==
> 
> The current core developers are all Cask Data employees. However, we 
> intend to establish a developer community that includes independent 
> and corporate contributors. We are encouraging new contributors via 
> our mailing lists, public presentations, and personal contacts, and we 
> will continue to do so.
> 
> Apache Phoenix developers have already contributed several patches to 
> Tephra, and have expressed interest in becoming long term contributors.
> 
> == Reliance on Salaried Developers ==
> 
> Currently, these developers are paid to work on Tephra. Once the 
> project has built a community, we expect to attract committers, 
> developers and community other than the current core developers. 
> However, because Cask Data products use Tephra internally, the 
> reliance on salaried developers is unlikely to change, at least in the near term.
> 
> == Relationships with Other Apache Products ==
> 
> Tephra is deeply integrated with Apache projects. Tephra provides 
> transactions over Apache HBase, and uses Apache Twill and Apache Zookeeper for coordination.
> A number of other Apache projects are Tephra dependencies, and are 
> listed in the External Dependencies section.
> 
> In addition, Apache Phoenix is using Tephra as the transaction engine.
> 
> == An Excessive Fascination with the Apache Brand ==
> 
> While we respect the reputation of the Apache brand and have no doubt 
> that it will attract contributors and users, our interest is primarily 
> to give Tephra a solid home as an open source project following an 
> established development model. We have also given additional reasons 
> in the Rationale and Alignment sections.
> 
> = Documentation =
> 
> The current documentation for Tephra is at https://github.com/caskdata/tephra.
> 
> = Initial Source =
> 
> Tephra codebase is currently hosted at https://github.com/caskdata/tephra.
> 
> = Source and Intellectual Property Submission Plan =
> 
> Tephra codebase is currently licensed under Apache 2.0 license.
> Cask Data owns the trademark for "Tephra". As part of the incubation 
> process Cask Data will transfer the trademark to Apache Foundation.
> 
> = External Dependencies =
> 
> The dependencies all have Apache-compatible licenses:
> * dropwizard metrics (Apache 2.0)
> * fastutil (Apache 2.0)
> * gson (Apache 2.0)
> * guava-libraries (Apache 2.0)
> * guice (Apache 2.0)
> * hadoop (Apache 2.0)
> * hbase (Apache 2.0)
> * hdfs (Apache 2.0)
> * junit (EPL v1.0)
> * logback (EPL v1.0 )
> * slf4j (MIT)
> * thrift (Apache 2.0)
> * twill (Apache 2.0)
> * zookeeper (Apache 2.0)
> 
> = Cryptography =
> 
> Tephra does not use cryptography itself, however it can run on secure 
> Hadoop, which uses Kerberos.
> 
> = Required Resources =
> 
> == Mailing Lists ==
> 
> * tephra-private for private PMC discussions (with moderated 
> subscriptions)
> * tephra-dev for technical discussions among contributors
> * tephra-commits for notification about commits
> 
> == Subversion Directory ==
> 
> Git is the preferred source control system: 
> git://git.apache.org/tephra
> 
> == Issue Tracking ==
> 
> JIRA Tephra (TEPHRA)
> 
> == Other Resources ==
> 
> The existing code already has unit tests, so we would like a Hudson 
> instance to run them whenever a new patch is submitted. This can be 
> added after project creation.
> 
> = Initial Committers =
> 
> * Andreas Neumann <anew at apache dot org>
> * Terence Yim <chtyim at apache dot org>
> * Poorna Chandra <poorna at apache dot org>
> * Gokul Gunasekaran <gokul at cask dot co>
> * James Taylor <jamestaylor at apache dot org>
> * Thomas D'Silva <tdsilva at apache dot org>
> * Gary Helmling <garyh at apache dot org>
> 
> = Affiliations =
> 
> * Andreas Neumann (Cask Data)
> * Terence Yim (Cask Data)
> * Poorna Chandra (Cask Data)
> * Gokul Gunasekaran (Cask Data)
> * James Taylor (Salesforce.com)
> * Thomas D'Silva (Salesforce.com)
> * Gary Helmling (Facebook)
> 
> = Sponsors =
> 
> == Champion ==
> 
> James Taylor <jamestaylor at apache dot org> (V.P., Apache Phoenix)
> 
> == Nominated Mentors ==
> 
> * James Taylor <jamestaylor at apache dot org>
> * Lars Hofhansl <larsh at apache dot org>
> * Andrew Purtell <apurtell at apache dot org>
> * Alan Gates <gates at apache dot org>
> * Henry Saputra <hsaputra at apache dot org>
> 
> == Sponsoring Entity ==
> 
> We are requesting that the Incubator sponsor this project.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Mime
View raw message