incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@apache.org>
Subject Re: [VOTE] Accept Omid into the Apache Incubator
Date Wed, 23 Mar 2016 23:16:12 GMT
+1 (binding)

-Flavio

> On 23 Mar 2016, at 22:42, James Taylor <jamestaylor@apache.org> wrote:
> 
> +1 (binding)
> 
> On Wed, Mar 23, 2016 at 3:34 PM, Chris Nauroth <cnauroth@hortonworks.com>
> wrote:
> 
>> +1 (binding)
>> 
>> --Chris Nauroth
>> 
>> 
>> 
>> 
>> On 3/23/16, 3:31 PM, "Daniel Dai" <daijyc@gmail.com> wrote:
>> 
>>> Following the discussion earlier, I'm calling a vote to accept Omid as
>>> a new Incubator project.
>>> 
>>> [ ] +1 Accept Omid into the Incubator
>>> [ ] +0 Indifferent to the acceptance of Omid
>>> [ ] -1 Do not accept Omid because ...
>>> 
>>> The vote will be open for the next 72 hours.
>>> 
>>> https://wiki.apache.org/incubator/OmidProposal
>>> 
>>> Thanks,
>>> Daniel
>>> 
>>> = Omid Proposal =
>>> 
>>> === Abstract ===
>>> Omid is a flexible, reliable, high performant and scalable ACID
>>> transactional framework that allows client applications to execute
>>> transactions on top of MVCC key/value-based NoSQL datastores
>>> (currently Apache HBase) providing Snapshot Isolation guarantees on
>>> the accessed data.
>>> 
>>> === Proposal ===
>>> Omid is a flexible open-source transactional framework that provides
>>> ACID transactions with Snapshot Isolation guarantees on top of NoSQL
>>> datastores. In particular, the current codebase brings the concept of
>>> transactions to the popular Apache HBase datastore. Omid offers great
>>> performance, it is highly available, and scalable. Omid's current
>>> version is able to scale to thousands of clients triggering concurrent
>>> transactions on application data stored in HBase. Omid can scale
>>> beyond 100K transactions per second on mid-range hardware while
>>> incurring in a minimal impact on the speed of data access in the
>>> datastore. We¹re currently experimenting with a prototype version that
>>> can improve the performance up to ~380K TPS.
>>> 
>>> Omid has been publicly available as an open-source project in Github
>>> under Apache License Version 2.0 since 2011 [1]. During these years,
>>> it has generated certain interest in the open source community,
>>> especially since the public presentation of the first version in
>>> Hadoop Summit 2013 [2]. Currently the Github project has 241 Stars and
>>> 93 forks. Yahoo Inc. submits this proposal to the Apache Software
>>> Foundation with the aim to transfer the Omid project -including its
>>> source code and documentation- to Apache in order to start the build
>>> of a stable open source community around it.
>>> 
>>> [1] https://github.com/yahoo/omid
>>> [2] Omid presentation at Hadoop Summit 2013:
>>> 
>> https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyq
>>> LU464Nxz4aQe7EPBus
>>> 
>>> === Background ===
>>> An Omid prototype was first released as an open-source project back in
>>> 2011. Inspired by Google Percolator [1], it offered a lock-free
>>> approach to transactions in NoSQL datastores (See [2]). However,
>>> during these years, the design of Omid has evolved significantly.
>>> Whilst the current open-sourced version maintains many aspects of the
>>> original implementation, it is the result of a major redesign of the
>>> first prototype released in 2011.
>>> 
>>> Omid has now a more decentralized design that does not sacrifice the
>>> consistency and performance of the original version. The current
>>> design also enables Omid to scale to thousands of clients executing
>>> transactions concurrently on application data stored in HBase.
>>> Internally, Omid still utilizes a lock-free approach to support
>>> multiple concurrent clients. Its design also relies on a centralized
>>> conflict detection component, the TSO, which now resolves in an
>>> efficient manner writeset collisions among concurrent transactions
>>> without having to piggyback commit information to the clients. Another
>>> important benefit of Omid is that it doesn't require any modification
>>> of the underlying key-value datastore, HBase in this case. Moreover,
>>> the recently added high availability algorithm allows to eliminate the
>>> single point of failure represented by the TSO in those system
>>> deployments requiring a higher degree of dependability. Last but not
>>> least, the provided user API is very simple, mimicking transaction
>>> managers in the relational world: begin, commit, rollback.
>>> 
>>> Omid is used internally at Yahoo. Sieve, Yahoo¹s web-scale content
>>> management platform powering some of next-generation search and
>>> personalization products is using Omid as a transaction manager in its
>>> processing pipeline. Sieve essentially acts as a huge processing hub
>>> between content feeds and serving systems. It provides an environment
>>> for highly customizable, real-time, streamed information processing,
>>> with typical discovery-to-service latencies of just a few seconds. In
>>> terms of scale and availability, Omid¹s new design was largely driven
>>> by Sieve¹s requirements.
>>> 
>>> At Yahoo, we are also making an effort to disseminate the current
>>> status of the project through blog entries (See [3], [4] and [5]) and
>>> submissions to technical and academic conferences such as ATC 2016,
>>> Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also
>>> appeared in a TechCrunch article in the last quarter of 2015 (See [6])
>>> 
>>> [1] D. Peng and F. Dabek, Large-scale Incremental Processing Using
>>> Distributed Transactions and Notifications. USENIX Symposium on
>>> Operating Systems Design and Implementation, 2010
>>> [2] D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh.
>>> Omid: Lock-free transactional support for distributed data stores. In
>>> Proc. of ICDE, 2013.
>>> [3]
>>> 
>> http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transacti
>>> on-processing-for
>>> [4]
>>> 
>> http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-prot
>>> ocol
>>> [5]
>>> http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
>>> [6]
>>> 
>> http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-sc
>>> alable-transaction-processing-to-hbase/
>>> 
>>> === Rationale ===
>>> Programming with ACID (Atomicity, Consistency, Isolation, Durability)
>>> transactions is very popular and it is featured in relational
>>> databases. However, in the Big Data ecosystem, applications typically
>>> use NoSQL datastores, which do not provide ACID transactions. Such
>>> NoSQL datastores used to give up transactional support for greater
>>> agility and scalability. However, while early NoSQL data store
>>> implementations did not include transaction support, the need for
>>> transactions soon emerged in Big Data applications when accessing
>>> shared data; for  example, transactions are very important  for
>>> modern, scalable systems that process content incrementally.
>>> 
>>> NoSQL datastores -including HBase- don¹t provide transactional
>>> frameworks to coordinate the access to the underlying data for
>>> preserving consistency. By using Omid, Big Data applications that need
>>> to bundle multiple read and write operations on HBase into logically
>>> indivisible units of work can execute transactions with ACID
>>> properties, just as they would use transactions in the relational
>>> database world. Omid extends the HBase key-value access APl with
>>> transaction semantics. It can be exercised either directly, or via
>>> higher level data management API¹s. For example, Apache Phoenix
>>> (SQL-on-top-of-HBase) might use Omid as its transaction management
>>> component.
>>> 
>>> The following features make Omid an attractive choice for system
>>> designers and other projects in the Apache community:
>>> 
>>> * Semantics. Omid implements Snapshot Isolation (SI,) supported by
>>> major SQL and NoSQL technologies (e.g. Google Percolator).
>>> 
>>> * Performance and Scalability. Omid  provides a highly scalable,
>>> lock-free implementation of SI. To the best of our knowledge, it is
>>> also one of the few open source NoSQL transactional platforms that can
>>> execute more than 100K transactions per second [1]. A new prototype
>>> still in development can go even further, up to ~380K TPS.
>>> 
>>> * Reliability.  Omid has a high-availability (HA) mode, in which the
>>> core service performing writeset conflict resolution operates as
>>> primary-backup process pair with automatic failover. The HA support
>>> has zero overhead on the mainstream operation.
>>> 
>>> * Adaptability. Omid current version provides transactions on data
>>> stored in Apache HBase. However, Omid¹s components are generic enough
>>> to be adapted to any other key-value NoSQL datasource that supports
>>> MVCC.
>>> 
>>> * Development. Omid provides a very simple interface that mimics
>>> standard HBase APIs, making it developer friendly. Only minimal
>>> extensions to the standard interfaces have been introduced to enable
>>> transactions.
>>> 
>>> * Simplicity. Omid leverages the HBase infrastructure for managing its
>>> own metadata. It entails no additional services apart from those
>>> provided and used by HBase.
>>> 
>>> * Track Record. As we have mentioned, Omid is already in use by
>>> very-large-scale production systems at Yahoo. Also, Hortonworks is
>>> integrating Omid in a metastore implementation for Hive based on
>>> HBase.
>>> 
>>> 
>>> [1] See also Haeinsa: https://github.com/vcnc/haeinsa/wiki/Performance
>>> 
>>> === Current Status ===
>>> Current Omid implementation is available in both, Yahoo¹s internal
>>> Github repository for internal use at Yahoo as well as in Yahoo¹s
>>> Github public repository (https://github.com/yahoo/omid.git). Both
>>> repositories are managed by Omid¹s current developers at Yahoo.
>>> 
>>> 
>>> As it is mentioned above, Yahoo is currently using Omid for providing
>>> transactions in Sieve, a web-scale content management platform that
>>> powers Yahoo¹s next-generation search and personalization products.
>>> 
>>> ==== Meritocracy ====
>>> The first version of Omid was originally created in 2011 by Maysam
>>> Yabandeh, Daniel Gomez-Ferro, Ivan B. Kelly, Benjamin Reed and Flavio
>>> Junqueira at the R&D Scalable Computing Group of Yahoo Labs in Spain.
>>> 
>>> During the years after its inception, Omid has matured to operate at
>>> Web scale and has been used internally by strategic projects at Yahoo
>>> such as Sieve. The current base of committers belong to the Yahoo team
>>> that took over the initial Omid prototype and rewrote it to meet the
>>> high availability and scalability requirements of the Sieve project.
>>> This base of committers has recently incorporated Hortonworks members
>>> that helped in the Omid adaptation to HBase 1.x versions.
>>> 
>>> With this initial committer base, we aim to form a larger community
>>> that can collaborate with new ideas over the current code base. This
>>> new community will run the project following the "Apache Way"
>>> (http://apache.org/foundation/governance/). Users and new contributors
>>> will be treated with respect and welcomed. To grow the community, we
>>> will encourage contributors to provide patches, review code, propose
>>> new features improvements, talk at conferences such as Hadoop Summit,
>>> HBaseCon, ApacheCon, etc. Committership and PMC membership will be
>>> offered according to meritocracy.
>>> 
>>> ==== Community ====
>>> The public Yahoo Omid repository at Github currently has 241 Stars and
>>> 93 forks, which means that there is an important interest for the
>>> project in the open-source community, at least compared with other
>>> similar projects (See https://github.com/yahoo/omid.git).
>>> 
>>> Recently, Hortonworks contributors to the Apache Hive project which
>>> are working on storing Hive metadata in HBase (Apache Jira HIVE-9452)
>>> manifested interest in using Omid. We started with them a fruitful
>>> collaboration that resulted in Omid supporting HBase 1.x versions.
>>> 
>>> Salesforce is also interested in collaborating in doing a Proof of
>>> Concept for integrating Omid as a pluggable transaction manager in
>>> Apache Phoenix.
>>> 
>>> Yahoo, Hortonworks and Salesforce participants will constitute the
>>> initial set of committers and mentors for the proposal.
>>> 
>>> ==== Core Developers ====
>>> The core developers of Omid are all skilled software developers and
>>> research engineers at Yahoo Inc. and Hortonworks with years of
>>> experiences in their fields. At this moment, developers are
>>> distributed across U.S. and Israel. The aim is to incorporate more
>>> committers from different organizations and locations over time.
>>> 
>>> The current set of developers include experienced committers from
>>> Apache HBase, Hive and Hadoop projects that have been working with us
>>> in the current codebase found in Github.
>>> 
>>> 
>>> Finally, some of the core developers are currently NOT affiliated with
>>> the ASF and would require new ICLAs to be filed.
>>> 
>>> === Alignment ===
>>> Omid enhances with transactions the already successful Apache HBase
>>> datastore project. We have collaborated with other developers inside
>>> and outside Yahoo which are involved in the Apache HBase community, so
>>> we have had reliable feedback from them.
>>> 
>>> 
>>> Although Omid brings value into HBase, the design of the current
>>> version provides a general transaction scheme that can potentially be
>>> adapted to other MVCC key-value datastores such as Apache Cassandra.
>>> 
>>> Apache Phoenix is also a potential target. Phoenix is a SQL layer on
>>> top of HBase that can potentially integrate Omid in order to provide
>>> the well-know concept of transactions to Phoenix-based applications.
>>> 
>>> === Known Risks ===
>>> ==== Orphaned products ====
>>> Yahoo¹s Research and Search organizations have been taking care of
>>> Omid development since the first prototype creation in 2011. Yahoo has
>>> a long history participating in open-source projects, and has been
>>> also a long time contributor to the Apache community. For example, in
>>> Apache, Yahoo is an important contributor in many projects in the
>>> Hadoop ecosystem such as HBase, Pig, Storm or YARN, and has also
>>> open-sourced other well-known projects outside Hadoop, such as
>>> Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make
>>> Omid also a successful open-source Apache product. If this happens, we
>>> are sure that a larger community will be formed around the project in
>>> a relatively short period of time, contributing to the diversification
>>> and stabilization of the base of committers.
>>> 
>>> ==== Inexperience with Open Source ====
>>> This project has long standing experienced mentors and interested
>>> contributors from Apache HBase, Hive and Phoenix to help us moving
>>> through the open source process. We are actively working with
>>> experienced Apache community members to improve our project and
>>> further testing.
>>> 
>>> ==== Homogeneous Developers ====
>>> Omid has been supported by Yahoo since its inception in 2011. However,
>>> all current committers are employed by their respective companies
>>> shown in the Affiliations section.
>>> 
>>> ==== Reliance on Salaried Developers ====
>>> All the current developers are paid by their employers to contribute
>>> to this project. Yahoo developers will also continuing maintaining the
>>> internal Omid repository at their company.
>>> Of course, other developers are welcomed to contribute to this project
>>> after it is open sourced in Apache.
>>> 
>>> ==== Relationships with Other Apache Product ====
>>> Current Omid incarnation serves transactional contexts to applications
>>> storing their data in HBase. However Omid design potentially allows to
>>> be adapted to serve transactions on top of other MVCC-based key-value
>>> datastores in Apache community such as Cassandra.
>>> 
>>> As a transactional framework, many other Apache projects such as
>>> Apache Spark, Apache Phoenix, Apache Storm, Apache Flink could
>>> potentially benefit from Omid to get transactional contexts. In
>>> particular, Apache Phoenix -a SQL layer on top of HBase- might use
>>> Omid as its transaction management component. Once we open source Omid
>>> as an Apache project, we expect to generate more interest in the
>>> surrounded communities.
>>> 
>>> Very recently, a new incubator proposal for a similar project called
>>> Tephra, has been submitted to the ASF. We think this is good for the
>>> Apache community, and we believe that there¹s room for both proposals
>>> as the design of each of them is based on different principles (e.g.
>>> Omid does not require to maintain the state of ongoing transactions on
>>> the server-side component) and due to the fact that both -Tephra and
>>> Omid- have also gained certain traction in the open-source community.
>>> 
>>> With regard to the Apache projects that Omid uses, apart from HBase,
>>> Omid relies on Apache Zookeeper and Curator projects in order to
>>> coordinate the (re)connection of transaction managers (acting as
>>> clients) to the conflict resolution component for transactions (server
>>> side.) They¹re also used in order to coordinate the master and backup
>>> replicas in high availability scenarios.
>>> 
>>> ==== An Excessive Fascination with the Apache Brand ====
>>> We are applying to the Incubator process because we think that it is
>>> the logical next step for the  Omid project after we open-sourced the
>>> code in Github some years ago. Yahoo has a long-standing history of
>>> contributing to Apache projects. The developers and contributors
>>> understand the implications of making it an Apache project, and
>>> strongly believe that the growing community can benefit from the
>>> Apache environment, ecosystem, and infrastrastructure.
>>> 
>>> === Documentation ===
>>> Current documentation about the project is available in the wiki of
>>> Omid¹s Github repository: https://github.com/yahoo/omid/wiki . It will
>>> be moved under https://omid.incubator.apache.org/docs if the project
>>> is accepted as an Apache Incubator.
>>> 
>>> === Initial Source ===
>>> Initial source code is currently hosted in Github for general viewing
>>> and contribution:
>>> https://github.com/yahoo/omid.git
>>> 
>>> Omid source code is written in Java code (99%) mixed with some shell
>>> script (1%) in order to configure and trigger the execution of main
>>> components.
>>> 
>>> The code will be moved to Apache http://git.apache.org/ if accepted as
>>> an Incubator project.
>>> 
>>> === Source and Intellectual Property Submission Plan ===
>>> The current Omid License for the code published in Github is Apache
>>> 2.0. If Omid fulfills and passes the conditions for being an Incubator
>>> project in the ASF, the source code will be transitioned via the
>>> Software Grant Agreement onto the ASF infrastructure and in turn made
>>> available under the Apache License, version 2.0.
>>> 
>>> === External Dependencies ===
>>> 
>>> The required external dependencies that are not Apache projects are
>>> all Apache licenses or other compatible Licenses:
>>> 
>>> 
>>> Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
>>> JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
>>> Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
>>> Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
>>> Testng v6.8.8  (http://testng.org) [Apache 2.0]
>>> SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
>>> Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
>>> Google Protocol Buffers v2.5.0
>>> (https://developers.google.com/protocol-buffers/) [BSD License]
>>> Mockito (http://mockito.org/) v1.9.5 [MIT License]
>>> LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/)
>>> [Apache 2.0]
>>> Coda Hale/Yammer.com Dropwizard Metrics v3.0.1
>>> (http://metrics.dropwizard.io/3.1.0/) [Apache 2.0]
>>> C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
>>> Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]
>>> 
>>> === Cryptography ===
>>> Omid project does not use cryptography itself. However, Apache HBase
>>> -the datastore on top of which Omid works in its current version- uses
>>> standard APIs and tools for SSH and SSL communication where necessary.
>>> 
>>> === Required Resources ===
>>> We request that following resources be created for the project to use:
>>> 
>>> ==== Mailing lists ====
>>> omid-private (moderated subscriptions)
>>> omid-commits (commit notification)
>>> omid-dev (technical discussions)
>>> 
>>> ==== Git repository ====
>>> https://github.com/apache/incubator-omid
>>> 
>>> ==== Documentation ====
>>> https://omid.incubator.apache.org/docs/
>>> 
>>> ==== JIRA instance ====
>>> https://issues.apache.org/jira/browse/omid
>>> 
>>> === Initial Committers ===
>>> * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>>> 
>>> * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>>> 
>>> * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>>> 
>>> * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>>> 
>>> * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
>>> 
>>> * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
>>> 
>>> 
>>> * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>>> 
>>> * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
>>> 
>>> * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
>>> 
>>> * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
>>> 
>>> 
>>> * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>>> 
>>> === Additional Interested Contributors ===
>>> * Ivan Kelly (ivank<AT>apache<DOT>org)
>>> * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)
>>> 
>>> === Affiliations ===
>>> * Edward Bortnikov, Yahoo Inc.
>>> 
>>> * Daniel Dai, Hortonworks
>>> 
>>> * Flavio P. Junqueira, Confluent
>>> 
>>> * Igor Katkov, Yahoo Inc.
>>> 
>>> * Ivan Kelly, Midokura
>>> 
>>> * Francis C. Liu, Yahoo Inc.
>>> 
>>> * Sameer Paranjpye, Arimo
>>> 
>>> * Francisco Perez-Sorrosal, Yahoo Inc.
>>> 
>>> * Ohad Shacham, Yahoo Inc.
>>> 
>>> * Maysam Yabandeh, Dropbox Inc.
>>> 
>>> === Sponsors ===
>>> 
>>> 
>>> ==== Champion ====
>>> Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
>>> 
>>> ==== Nominated Mentors ====
>>> Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
>>> Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
>>> Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
>>> Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
>>> James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)
>>> 
>>> ==== Sponsoring Entity ====
>>> Apache Incubator PMC
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> 
>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message