Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C812F183F2 for ; Fri, 25 Mar 2016 12:28:55 +0000 (UTC) Received: (qmail 4707 invoked by uid 500); 25 Mar 2016 12:28:55 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 4501 invoked by uid 500); 25 Mar 2016 12:28:54 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 4490 invoked by uid 99); 25 Mar 2016 12:28:54 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Mar 2016 12:28:54 +0000 Received: from [10.0.0.6] (c-50-166-176-120.hsd1.nj.comcast.net [50.166.176.120]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 541A91A003F for ; Fri, 25 Mar 2016 12:28:54 +0000 (UTC) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: [VOTE] Accept Omid into the Apache Incubator From: Suresh Marru In-Reply-To: Date: Fri, 25 Mar 2016 08:28:53 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <6BBE6065-9802-43E5-9E2E-D8BB0728CC5A@apache.org> References: To: general@incubator.apache.org X-Mailer: Apple Mail (2.3124) + 1 (binding). Suresh > On Mar 23, 2016, at 6:31 PM, Daniel Dai wrote: >=20 > Following the discussion earlier, I'm calling a vote to accept Omid as > a new Incubator project. >=20 > [ ] +1 Accept Omid into the Incubator > [ ] +0 Indifferent to the acceptance of Omid > [ ] -1 Do not accept Omid because ... >=20 > The vote will be open for the next 72 hours. >=20 > https://wiki.apache.org/incubator/OmidProposal >=20 > Thanks, > Daniel >=20 > =3D Omid Proposal =3D >=20 > =3D=3D=3D Abstract =3D=3D=3D > Omid is a flexible, reliable, high performant and scalable ACID > transactional framework that allows client applications to execute > transactions on top of MVCC key/value-based NoSQL datastores > (currently Apache HBase) providing Snapshot Isolation guarantees on > the accessed data. >=20 > =3D=3D=3D Proposal =3D=3D=3D > Omid is a flexible open-source transactional framework that provides > ACID transactions with Snapshot Isolation guarantees on top of NoSQL > datastores. In particular, the current codebase brings the concept of > transactions to the popular Apache HBase datastore. Omid offers great > performance, it is highly available, and scalable. Omid's current > version is able to scale to thousands of clients triggering concurrent > transactions on application data stored in HBase. Omid can scale > beyond 100K transactions per second on mid-range hardware while > incurring in a minimal impact on the speed of data access in the > datastore. We=E2=80=99re currently experimenting with a prototype = version that > can improve the performance up to ~380K TPS. >=20 > Omid has been publicly available as an open-source project in Github > under Apache License Version 2.0 since 2011 [1]. During these years, > it has generated certain interest in the open source community, > especially since the public presentation of the first version in > Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and > 93 forks. Yahoo Inc. submits this proposal to the Apache Software > Foundation with the aim to transfer the Omid project -including its > source code and documentation- to Apache in order to start the build > of a stable open source community around it. >=20 > [1] https://github.com/yahoo/omid > [2] Omid presentation at Hadoop Summit 2013: > = https://www.youtube.com/watch?v=3DRhdmo9pVGgU&index=3D68&list=3DPLSAiKuajR= e2luyqLU464Nxz4aQe7EPBus >=20 > =3D=3D=3D Background =3D=3D=3D > An Omid prototype was first released as an open-source project back in > 2011. Inspired by Google Percolator [1], it offered a lock-free > approach to transactions in NoSQL datastores (See [2]). However, > during these years, the design of Omid has evolved significantly. > Whilst the current open-sourced version maintains many aspects of the > original implementation, it is the result of a major redesign of the > first prototype released in 2011. >=20 > Omid has now a more decentralized design that does not sacrifice the > consistency and performance of the original version. The current > design also enables Omid to scale to thousands of clients executing > transactions concurrently on application data stored in HBase. > Internally, Omid still utilizes a lock-free approach to support > multiple concurrent clients. Its design also relies on a centralized > conflict detection component, the TSO, which now resolves in an > efficient manner writeset collisions among concurrent transactions > without having to piggyback commit information to the clients. Another > important benefit of Omid is that it doesn't require any modification > of the underlying key-value datastore, HBase in this case. Moreover, > the recently added high availability algorithm allows to eliminate the > single point of failure represented by the TSO in those system > deployments requiring a higher degree of dependability. Last but not > least, the provided user API is very simple, mimicking transaction > managers in the relational world: begin, commit, rollback. >=20 > Omid is used internally at Yahoo. Sieve, Yahoo=E2=80=99s web-scale = content > management platform powering some of next-generation search and > personalization products is using Omid as a transaction manager in its > processing pipeline. Sieve essentially acts as a huge processing hub > between content feeds and serving systems. It provides an environment > for highly customizable, real-time, streamed information processing, > with typical discovery-to-service latencies of just a few seconds. In > terms of scale and availability, Omid=E2=80=99s new design was largely = driven > by Sieve=E2=80=99s requirements. >=20 > At Yahoo, we are also making an effort to disseminate the current > status of the project through blog entries (See [3], [4] and [5]) and > submissions to technical and academic conferences such as ATC 2016, > Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also > appeared in a TechCrunch article in the last quarter of 2015 (See [6]) >=20 > [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using > Distributed Transactions and Notifications. USENIX Symposium on > Operating Systems Design and Implementation, 2010 > [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh. > Omid: Lock-free transactional support for distributed data stores. In > Proc. of ICDE, 2013. > [3] = http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transacti= on-processing-for > [4] = http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-prot= ocol > [5] = http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid > [6] = http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-sc= alable-transaction-processing-to-hbase/ >=20 > =3D=3D=3D Rationale =3D=3D=3D > Programming with ACID (Atomicity, Consistency, Isolation, Durability) > transactions is very popular and it is featured in relational > databases. However, in the Big Data ecosystem, applications typically > use NoSQL datastores, which do not provide ACID transactions. Such > NoSQL datastores used to give up transactional support for greater > agility and scalability. However, while early NoSQL data store > implementations did not include transaction support, the need for > transactions soon emerged in Big Data applications when accessing > shared data; for example, transactions are very important for > modern, scalable systems that process content incrementally. >=20 > NoSQL datastores -including HBase- don=E2=80=99t provide transactional > frameworks to coordinate the access to the underlying data for > preserving consistency. By using Omid, Big Data applications that need > to bundle multiple read and write operations on HBase into logically > indivisible units of work can execute transactions with ACID > properties, just as they would use transactions in the relational > database world. Omid extends the HBase key-value access APl with > transaction semantics. It can be exercised either directly, or via > higher level data management API=E2=80=99s. For example, Apache = Phoenix > (SQL-on-top-of-HBase) might use Omid as its transaction management > component. >=20 > The following features make Omid an attractive choice for system > designers and other projects in the Apache community: >=20 > * Semantics. Omid implements Snapshot Isolation (SI,) supported by > major SQL and NoSQL technologies (e.g. Google Percolator). >=20 > * Performance and Scalability. Omid provides a highly scalable, > lock-free implementation of SI. To the best of our knowledge, it is > also one of the few open source NoSQL transactional platforms that can > execute more than 100K transactions per second [1]. A new prototype > still in development can go even further, up to ~380K TPS. >=20 > * Reliability. Omid has a high-availability (HA) mode, in which the > core service performing writeset conflict resolution operates as > primary-backup process pair with automatic failover. The HA support > has zero overhead on the mainstream operation. >=20 > * Adaptability. Omid current version provides transactions on data > stored in Apache HBase. However, Omid=E2=80=99s components are generic = enough > to be adapted to any other key-value NoSQL datasource that supports > MVCC. >=20 > * Development. Omid provides a very simple interface that mimics > standard HBase APIs, making it developer friendly. Only minimal > extensions to the standard interfaces have been introduced to enable > transactions. >=20 > * Simplicity. Omid leverages the HBase infrastructure for managing its > own metadata. It entails no additional services apart from those > provided and used by HBase. >=20 > * Track Record. As we have mentioned, Omid is already in use by > very-large-scale production systems at Yahoo. Also, Hortonworks is > integrating Omid in a metastore implementation for Hive based on > HBase. >=20 >=20 > [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance >=20 > =3D=3D=3D Current Status =3D=3D=3D > Current Omid implementation is available in both, Yahoo=E2=80=99s = internal > Github repository for internal use at Yahoo as well as in Yahoo=E2=80=99= s > Github public repository (https://github.com/yahoo/omid.git). Both > repositories are managed by Omid=E2=80=99s current developers at = Yahoo. >=20 >=20 > As it is mentioned above, Yahoo is currently using Omid for providing > transactions in Sieve, a web-scale content management platform that > powers Yahoo=E2=80=99s next-generation search and personalization = products. >=20 > =3D=3D=3D=3D Meritocracy =3D=3D=3D=3D > The first version of Omid was originally created in 2011 by Maysam > Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio > Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain. >=20 > During the years after its inception, Omid has matured to operate at > Web scale and has been used internally by strategic projects at Yahoo > such as Sieve. The current base of committers belong to the Yahoo team > that took over the initial Omid prototype and rewrote it to meet the > high availability and scalability requirements of the Sieve project. > This base of committers has recently incorporated Hortonworks members > that helped in the Omid adaptation to HBase 1.x versions. >=20 > With this initial committer base, we aim to form a larger community > that can collaborate with new ideas over the current code base. This > new community will run the project following the "Apache Way" > (http://apache.org/foundation/governance/). Users and new contributors > will be treated with respect and welcomed. To grow the community, we > will encourage contributors to provide patches, review code, propose > new features improvements, talk at conferences such as Hadoop Summit, > HBaseCon, ApacheCon, etc. Committership and PMC membership will be > offered according to meritocracy. >=20 > =3D=3D=3D=3D Community =3D=3D=3D=3D > The public Yahoo Omid repository at Github currently has 241 Stars and > 93 forks, which means that there is an important interest for the > project in the open-source community, at least compared with other > similar projects (See https://github.com/yahoo/omid.git). >=20 > Recently, Hortonworks contributors to the Apache Hive project which > are working on storing Hive metadata in HBase (Apache Jira HIVE-9452) > manifested interest in using Omid. We started with them a fruitful > collaboration that resulted in Omid supporting HBase 1.x versions. >=20 > Salesforce is also interested in collaborating in doing a Proof of > Concept for integrating Omid as a pluggable transaction manager in > Apache Phoenix. >=20 > Yahoo, Hortonworks and Salesforce participants will constitute the > initial set of committers and mentors for the proposal. >=20 > =3D=3D=3D=3D Core Developers =3D=3D=3D=3D > The core developers of Omid are all skilled software developers and > research engineers at Yahoo Inc. and Hortonworks with years of > experiences in their fields. At this moment, developers are > distributed across U.S. and Israel. The aim is to incorporate more > committers from different organizations and locations over time. >=20 > The current set of developers include experienced committers from > Apache HBase, Hive and Hadoop projects that have been working with us > in the current codebase found in Github. >=20 >=20 > Finally, some of the core developers are currently NOT affiliated with > the ASF and would require new ICLAs to be filed. >=20 > =3D=3D=3D Alignment =3D=3D=3D > Omid enhances with transactions the already successful Apache HBase > datastore project. We have collaborated with other developers inside > and outside Yahoo which are involved in the Apache HBase community, so > we have had reliable feedback from them. >=20 >=20 > Although Omid brings value into HBase, the design of the current > version provides a general transaction scheme that can potentially be > adapted to other MVCC key-value datastores such as Apache Cassandra. >=20 > Apache Phoenix is also a potential target. Phoenix is a SQL layer on > top of HBase that can potentially integrate Omid in order to provide > the well-know concept of transactions to Phoenix-based applications. >=20 > =3D=3D=3D Known Risks =3D=3D=3D > =3D=3D=3D=3D Orphaned products =3D=3D=3D=3D > Yahoo=E2=80=99s Research and Search organizations have been taking = care of > Omid development since the first prototype creation in 2011. Yahoo has > a long history participating in open-source projects, and has been > also a long time contributor to the Apache community. For example, in > Apache, Yahoo is an important contributor in many projects in the > Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also > open-sourced other well-known projects outside Hadoop, such as > Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make > Omid also a successful open-source Apache product. If this happens, we > are sure that a larger community will be formed around the project in > a relatively short period of time, contributing to the diversification > and stabilization of the base of committers. >=20 > =3D=3D=3D=3D Inexperience with Open Source =3D=3D=3D=3D > This project has long standing experienced mentors and interested > contributors from Apache HBase, Hive and Phoenix to help us moving > through the open source process. We are actively working with > experienced Apache community members to improve our project and > further testing. >=20 > =3D=3D=3D=3D Homogeneous Developers =3D=3D=3D=3D > Omid has been supported by Yahoo since its inception in 2011. However, > all current committers are employed by their respective companies > shown in the Affiliations section. >=20 > =3D=3D=3D=3D Reliance on Salaried Developers =3D=3D=3D=3D > All the current developers are paid by their employers to contribute > to this project. Yahoo developers will also continuing maintaining the > internal Omid repository at their company. > Of course, other developers are welcomed to contribute to this project > after it is open sourced in Apache. >=20 > =3D=3D=3D=3D Relationships with Other Apache Product =3D=3D=3D=3D > Current Omid incarnation serves transactional contexts to applications > storing their data in HBase. However Omid design potentially allows to > be adapted to serve transactions on top of other MVCC-based key-value > datastores in Apache community such as Cassandra. >=20 > As a transactional framework, many other Apache projects such as > Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could > potentially benefit from Omid to get transactional contexts. In > particular, Apache Phoenix -a SQL layer on top of HBase- might use > Omid as its transaction management component. Once we open source Omid > as an Apache project, we expect to generate more interest in the > surrounded communities. >=20 > Very recently, a new incubator proposal for a similar project called > Tephra, has been submitted to the ASF. We think this is good for the > Apache community, and we believe that there=E2=80=99s room for both = proposals > as the design of each of them is based on different principles (e.g. > Omid does not require to maintain the state of ongoing transactions on > the server-side component) and due to the fact that both -Tephra and > Omid- have also gained certain traction in the open-source community. >=20 > With regard to the Apache projects that Omid uses, apart from HBase, > Omid relies on Apache Zookeeper and Curator projects in order to > coordinate the (re)connection of transaction managers (acting as > clients) to the conflict resolution component for transactions (server > side.) They=E2=80=99re also used in order to coordinate the master and = backup > replicas in high availability scenarios. >=20 > =3D=3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D=3D= > We are applying to the Incubator process because we think that it is > the logical next step for the Omid project after we open-sourced the > code in Github some years ago. Yahoo has a long-standing history of > contributing to Apache projects. The developers and contributors > understand the implications of making it an Apache project, and > strongly believe that the growing community can benefit from the > Apache environment, ecosystem, and infrastrastructure. >=20 > =3D=3D=3D Documentation =3D=3D=3D > Current documentation about the project is available in the wiki of > Omid=E2=80=99s Github repository: https://github.com/yahoo/omid/wiki . = It will > be moved under https://omid.incubator.apache.org/docs if the project > is accepted as an Apache Incubator. >=20 > =3D=3D=3D Initial Source =3D=3D=3D > Initial source code is currently hosted in Github for general viewing > and contribution: > https://github.com/yahoo/omid.git >=20 > Omid source code is written in Java code (99%) mixed with some shell > script (1%) in order to configure and trigger the execution of main > components. >=20 > The code will be moved to Apache http://git.apache.org/ if accepted as > an Incubator project. >=20 > =3D=3D=3D Source and Intellectual Property Submission Plan =3D=3D=3D > The current Omid License for the code published in Github is Apache > 2.0. If Omid fulfills and passes the conditions for being an Incubator > project in the ASF, the source code will be transitioned via the > Software Grant Agreement onto the ASF infrastructure and in turn made > available under the Apache License, version 2.0. >=20 > =3D=3D=3D External Dependencies =3D=3D=3D >=20 > The required external dependencies that are not Apache projects are > all Apache licenses or other compatible Licenses: >=20 >=20 > Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0] > JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License] > Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0] > Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0] > Testng v6.8.8 (http://testng.org) [Apache 2.0] > SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License] > Netty (http://netty.io) v3.2.6.Final [Apache 2.0] > Google Protocol Buffers v2.5.0 > (https://developers.google.com/protocol-buffers/) [BSD License] > Mockito (http://mockito.org/) v1.9.5 [MIT License] > LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/) = [Apache 2.0] > Coda Hale/Yammer.com Dropwizard Metrics v3.0.1 > (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0] > C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0] > Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License] >=20 > =3D=3D=3D Cryptography =3D=3D=3D > Omid project does not use cryptography itself. However, Apache HBase > -the datastore on top of which Omid works in its current version- uses > standard APIs and tools for SSH and SSL communication where necessary. >=20 > =3D=3D=3D Required Resources =3D=3D=3D > We request that following resources be created for the project to use: >=20 > =3D=3D=3D=3D Mailing lists =3D=3D=3D=3D > omid-private (moderated subscriptions) > omid-commits (commit notification) > omid-dev (technical discussions) >=20 > =3D=3D=3D=3D Git repository =3D=3D=3D=3D > https://github.com/apache/incubator-omid >=20 > =3D=3D=3D=3D Documentation =3D=3D=3D=3D > https://omid.incubator.apache.org/docs/ >=20 > =3D=3D=3D=3D JIRA instance =3D=3D=3D=3D > https://issues.apache.org/jira/browse/omid >=20 > =3D=3D=3D Initial Committers =3D=3D=3D > * Daniel Dai, Hortonworks (daijyhortonworkscom) >=20 > * Alan Gates, Hortonworks, (gateshortonworkscom) >=20 > * Lars Hofhansl, Salesforce (larshapacheorg) >=20 > * Flavio P. Junqueira, Confluent (fpjapacheorg) >=20 > * Igor Katkov (katkoviyahoo-inccom) >=20 > * Francis C. Liu (fcliuyahoo-inccom) >=20 >=20 > * Thejas Nair, Hortonworks (thejashortonworkscom) >=20 > * Francisco Perez-Sorrosal (fperezyahoo-inccom) >=20 > * Sameer Paranjpye (sparanjpyeyahoocom) >=20 > * Ohad Shacham (ohadsyahoo-inccom) >=20 >=20 > * James Taylor, Salesforce (jamestaylorapacheorg>) >=20 > =3D=3D=3D Additional Interested Contributors =3D=3D=3D > * Ivan Kelly (ivankapacheorg) > * Maysam Yabandeh (myabandehdropboxcom) >=20 > =3D=3D=3D Affiliations =3D=3D=3D > * Edward Bortnikov, Yahoo Inc. >=20 > * Daniel Dai, Hortonworks >=20 > * Flavio P. Junqueira, Confluent >=20 > * Igor Katkov, Yahoo Inc. >=20 > * Ivan Kelly, Midokura >=20 > * Francis C. Liu, Yahoo Inc. >=20 > * Sameer Paranjpye, Arimo >=20 > * Francisco Perez-Sorrosal, Yahoo Inc. >=20 > * Ohad Shacham, Yahoo Inc. >=20 > * Maysam Yabandeh, Dropbox Inc. >=20 > =3D=3D=3D Sponsors =3D=3D=3D >=20 >=20 > =3D=3D=3D=3D Champion =3D=3D=3D=3D > Daniel Dai, Hortonworks (daijyhortonworkscom) >=20 > =3D=3D=3D=3D Nominated Mentors =3D=3D=3D=3D > Alan Gates, Hortonworks, (gateshortonworkscom) > Lars Hofhansl, Salesforce (larshapacheorg) > Flavio P. Junqueira, Confluent (fpjapacheorg) > Thejas Nair, Hortonworks (thejashortonworkscom) > James Taylor, Salesforce (jamestaylorapacheorg>) >=20 > =3D=3D=3D=3D Sponsoring Entity =3D=3D=3D=3D > Apache Incubator PMC >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org