incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "OmidProposal" by daijy
Date Thu, 17 Mar 2016 18:14:11 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "OmidProposal" page has been changed by daijy:
https://wiki.apache.org/incubator/OmidProposal

New page:
= Abstract =
Omid is a flexible, reliable, high performant and scalable ACID transactional framework that
allows client applications to execute transactions on top of MVCC key/value-based NoSQL datastores
(currently Apache HBase) providing Snapshot Isolation guarantees on the accessed data.

= Proposal =
Omid is a flexible open-source transactional framework that provides ACID transactions with
Snapshot Isolation guarantees on top of NoSQL datastores. In particular, the current codebase
brings the concept of transactions to the popular Apache HBase datastore. Omid offers great
performance, it is highly available, and scalable. Omid's current version is able to scale
to thousands of clients triggering concurrent transactions on application data stored in HBase.
Omid can scale beyond 100K transactions per second on mid-range hardware while incurring in
a minimal impact on the speed of data access in the datastore. We’re currently experimenting
with a prototype version that can improve the performance up to ~380K TPS.

Omid has been publicly available as an open-source project in Github under Apache License
Version 2.0 since 2011 [1]. During these years, it has generated certain interest in the open
source community, especially since the public presentation of the first version in Hadoop
Summit 2013 [2]. Currently the Github project has 241 Stars and 93 forks. Yahoo Inc. submits
this proposal to the Apache Software Foundation with the aim to transfer the Omid project
-including its source code and documentation- to Apache in order to start the build of a stable
open source community around it.

 1. https://github.com/yahoo/omid

 1. [[https://www.youtube.com/watch?v=Rhdmo9pVGgU&index=68&list=PLSAiKuajRe2luyqLU464Nxz4aQe7EPBus|Omid
presentation at Hadoop Summit 2013]]

= Background =
An Omid prototype was first released as an open-source project back in 2011. Inspired by Google
Percolator [1], it offered a lock-free approach to transactions in NoSQL datastores (See [2]).
However, during these years, the design of Omid has evolved significantly. Whilst the current
open-sourced version maintains many aspects of the original implementation, it is the result
of a major redesign of the first prototype released in 2011. 

Omid has now a more decentralized design that does not sacrifice the consistency and performance
of the original version. The current design also enables Omid to scale to thousands of clients
executing transactions concurrently on application data stored in HBase. Internally, Omid
still utilizes a lock-free approach to support multiple concurrent clients. Its design also
relies on a centralized conflict detection component, the TSO, which now resolves in an efficient
manner writeset collisions among concurrent transactions without having to piggyback commit
information to the clients. Another important benefit of Omid is that it doesn't require any
modification of the underlying key-value datastore, HBase in this case. Moreover, the recently
added high availability algorithm allows to eliminate the single point of failure represented
by the TSO in those system deployments requiring a higher degree of dependability. Last but
not least, the provided user API is very simple, mimicking transaction managers in the relational
world: begin, commit, rollback.

Omid is used internally at Yahoo. Sieve, Yahoo’s web-scale content management platform powering
some of next-generation search and personalization products is using Omid as a transaction
manager in its processing pipeline. Sieve essentially acts as a huge processing hub between
content feeds and serving systems. It provides an environment for highly customizable, real-time,
streamed information processing, with typical discovery-to-service latencies of just a few
seconds. In terms of scale and availability, Omid’s new design was largely driven by Sieve’s
requirements.

At Yahoo, we are also making an effort to disseminate the current status of the project through
blog entries (See [3], [4] and [5]) and submissions to technical and academic conferences
such as ATC 2016, Hadoop Summit 2016, HBaseConf 2016. Last but not least, Omid also appeared
in a TechCrunch article in the last quarter of 2015 (See [6])

 1. D. Peng and F. Dabek, Large-scale Incremental Processing Using Distributed Transactions
and Notifications. USENIX Symposium on Operating Systems Design and Implementation, 2010
 1. D. Gomez-Ferro, F. Junqueira, I. Kelly, B. Reed, and M. Yabandeh. Omid: Lock-free transactional
support for distributed data stores. In Proc. of ICDE, 2013
 1. http://yahoohadoop.tumblr.com/post/129089878751/introducing-omid-transaction-processing-for
 1. http://yahoohadoop.tumblr.com/post/132695603476/omid-architecture-and-protocol
 1. http://yahoohadoop.tumblr.com/post/138682361161/high-availability-in-omid
 1. http://techcrunch.com/2015/10/01/yahoos-open-source-omid-project-brings-scalable-transaction-processing-to-hbase/

= Rationale =
Programming with ACID (Atomicity, Consistency, Isolation, Durability) transactions is very
popular and it is featured in relational databases. However, in the Big Data ecosystem, applications
typically use NoSQL datastores, which do not provide ACID transactions. Such NoSQL datastores
used to give up transactional support for greater agility and scalability. However, while
early NoSQL data store implementations did not include transaction support, the need for transactions
soon emerged in Big Data applications when accessing shared data; for  example, transactions
are very important  for modern, scalable systems that process content incrementally.

NoSQL datastores -including HBase- don’t provide transactional frameworks to coordinate
the access to the underlying data for preserving consistency. By using Omid, Big Data applications
that need to bundle multiple read and write operations on HBase into logically indivisible
units of work can execute transactions with ACID properties, just as they would use transactions
in the relational database world. Omid extends the HBase key-value access APl with transaction
semantics. It can be exercised either directly, or via higher level data management API’s.
For example, Apache Phoenix (SQL-on-top-of-HBase) might use Omid as its transaction management
component. 

The following features make Omid an attractive choice for system designers and other projects
in the Apache community:

 * Semantics. Omid implements Snapshot Isolation (SI,) supported by major SQL and NoSQL technologies
(e.g. Google Percolator).

 * Performance and Scalability. Omid  provides a highly scalable, lock-free implementation
of SI. To the best of our knowledge, it is also one of the few open source NoSQL transactional
platforms that can execute more than 100K transactions per second [1]. A new prototype still
in development can go even further, up to ~380K TPS.

 * Reliability.  Omid has a high-availability (HA) mode, in which the core service performing
writeset conflict resolution operates as primary-backup process pair with automatic failover.
The HA support has zero overhead on the mainstream operation.  

 * Adaptability. Omid current version provides transactions on data stored in Apache HBase.
However, Omid’s components are generic enough to be adapted to any other key-value NoSQL
datasource that supports MVCC.

 * Development. Omid provides a very simple interface that mimics standard HBase APIs, making
it developer friendly. Only minimal extensions to the standard interfaces have been introduced
to enable transactions.  

 * Simplicity. Omid leverages the HBase infrastructure for managing its own metadata. It entails
no additional services apart from those provided and used by HBase.

 * Track Record. As we have mentioned, Omid is already in use by very-large-scale production
systems at Yahoo. Also, Hortonworks is integrating Omid in a metastore implementation for
Hive based on HBase.


 1. See also [[https://github.com/vcnc/haeinsa/wiki/Performance|Haeinsa]]

= Current Status =
Current Omid implementation is available in both, Yahoo’s internal Github repository for
internal use at Yahoo as well as in [[https://github.com/yahoo/omid.git|Yahoo’s Github public
repository]]. Both repositories are managed by Omid’s current developers at Yahoo.


As it is mentioned above, Yahoo is currently using Omid for providing transactions in Sieve,
a web-scale content management platform that powers Yahoo’s next-generation search and personalization
products.

== Meritocracy ==
The first version of Omid was originally created in 2011 by Maysam Yabandeh, Daniel Gomez-Ferro,
Ivan B. Kelly, Benjamin Reed and Flavio Junqueira at the R&D Scalable Computing Group
of Yahoo Labs in Spain. 

During the years after its inception, Omid has matured to operate at Web scale and has been
used internally by strategic projects at Yahoo such as Sieve. The current base of committers
belong to the Yahoo team that took over the initial Omid prototype and rewrote it to meet
the high availability and scalability requirements of the Sieve project. This base of committers
has recently incorporated Hortonworks members that helped in the Omid adaptation to HBase
1.x versions.

With this initial committer base, we aim to form a larger community that can collaborate with
new ideas over the current code base. This new community will run the project following the
[[http://apache.org/foundation/governance/|"Apache Way"]]. Users and new contributors will
be treated with respect and welcomed. To grow the community, we will encourage contributors
to provide patches, review code, propose new features improvements, talk at conferences such
as Hadoop Summit, HBaseCon, ApacheCon, etc. Committership and PMC membership will be offered
according to meritocracy.

== Community ==
The public Yahoo Omid repository at Github currently has 241 Stars and 93 forks, which means
that there is an important interest for the project in the open-source community, at least
compared with other similar projects (See https://github.com/yahoo/omid.git).

Recently, Hortonworks contributors to the Apache Hive project which are working on storing
Hive metadata in HBase (Apache Jira HIVE-9452) manifested interest in using Omid. We started
with them a fruitful collaboration that resulted in Omid supporting HBase 1.x versions.

Salesforce is also interested in collaborating in doing a Proof of Concept for integrating
Omid as a pluggable transaction manager in Apache Phoenix.

Yahoo, Hortonworks and Salesforce participants will constitute the initial set of committers
and mentors for the proposal.

== Core Developers ==
The core developers of Omid are all skilled software developers and research engineers at
Yahoo Inc. and Hortonworks with years of experiences in their fields. At this moment, developers
are distributed across U.S. and Israel. The aim is to incorporate more committers from different
organizations and locations over time.

The current set of developers include experienced committers from Apache HBase, Hive and Hadoop
projects that have been working with us in the current codebase found in Github.


Finally, some of the core developers are currently NOT affiliated with the ASF and would require
new ICLAs to be filed.

= Alignment =
Omid enhances with transactions the already successful Apache HBase datastore project. We
have collaborated with other developers inside and outside Yahoo which are involved in the
Apache HBase community, so we have had reliable feedback from them.


Although Omid brings value into HBase, the design of the current version provides a general
transaction scheme that can potentially be adapted to other MVCC key-value datastores such
as Apache Cassandra.

Apache Phoenix is also a potential target. Phoenix is a SQL layer on top of HBase that can
potentially integrate Omid in order to provide the well-know concept of transactions to Phoenix-based
applications.

= Known Risks =
== Orphaned products ==
Yahoo’s Research and Search organizations have been taking care of Omid development since
the first prototype creation in 2011. Yahoo has a long history participating in open-source
projects, and has been also a long time contributor to the Apache community. For example,
in Apache, Yahoo is an important contributor in many projects in the Hadoop ecosystem such
as HBase, Pig, Storm or YARN, and has also open-sourced other well-known projects outside
Hadoop, such as Zookeeper or Bookkeeper. So it is in the best interest of Yahoo make Omid
also a successful open-source Apache product. If this happens, we are sure that a larger community
will be formed around the project in a relatively short period of time, contributing to the
diversification and stabilization of the base of committers.

== Inexperience with Open Source ==
This project has long standing experienced mentors and interested contributors from Apache
HBase, Hive and Phoenix to help us moving through the open source process. We are actively
working with experienced Apache community members to improve our project and further testing.

== Homogeneous Developers ==
Omid has been supported by Yahoo since its inception in 2011. All current committers are employed
at Yahoo or Hortonworks.

== Reliance on Salaried Developers ==
All the current developers are paid by their employers to contribute to this project. Yahoo
developers will also continuing maintaining the internal Omid repository at their company.
Of course, other developers are welcomed to contribute to this project after it is open sourced
in Apache.

== Relationships with Other Apache Product ==
Current Omid incarnation serves transactional contexts to applications storing their data
in HBase. However Omid design potentially allows to be adapted to serve transactions on top
of other MVCC-based key-value datastores in Apache community such as Cassandra.

As a transactional framework, many other Apache projects such as Apache Spark, Apache Phoenix,
Apache Storm, Apache Flink could potentially benefit from Omid to get transactional contexts.
In particular, Apache Phoenix -a SQL layer on top of HBase- might use Omid as its transaction
management component. Once we open source Omid as an Apache project, we expect to generate
more interest in the surrounded communities.

Very recently, a new incubator proposal for a similar project called Tephra, has been submitted
to the ASF. We think this is good for the Apache community, and we believe that there’s
room for both proposals as the design of each of them is based on different principles (e.g.
Omid does not require to maintain the state of ongoing transactions on the server-side component)
and due to the fact that both -Tephra and Omid- have also gained certain traction in the open-source
community.

With regard to the Apache projects that Omid uses, apart from HBase, Omid relies on Apache
Zookeeper and Curator projects in order to coordinate the (re)connection of transaction managers
(acting as clients) to the conflict resolution component for transactions (server side.) They’re
also used in order to coordinate the master and backup replicas in high availability scenarios.

== An Excessive Fascination with the Apache Brand ==
We are applying to the Incubator process because we think that it is the logical next step
for the  Omid project after we open-sourced the code in Github some years ago. Yahoo has a
long-standing history of contributing to Apache projects. The developers and contributors
understand the implications of making it an Apache project, and strongly believe that the
growing community can benefit from the Apache environment, ecosystem, and infrastrastructure.

= Documentation =
Current documentation about the project is available in the wiki of Omid’s [[https://github.com/yahoo/omid/wiki|Github
repository]]. It will be moved under https://omid.incubator.apache.org/docs if the project
is accepted as an Apache Incubator.

= Initial Source =
Initial source code is currently hosted in Github for general viewing and contribution:
https://github.com/yahoo/omid.git

Omid source code is written in Java code (99%) mixed with some shell script (1%) in order
to configure and trigger the execution of main components.

The code will be moved to [[http://git.apache.org/|Apache]] if accepted as an Incubator project.

= Source and Intellectual Property Submission Plan =
The current Omid License for the code published in Github is Apache 2.0. If Omid fulfills
and passes the conditions for being an Incubator project in the ASF, the source code will
be transitioned via the Software Grant Agreement onto the ASF infrastructure and in turn made
available under the Apache License, version 2.0.

= External Dependencies =

The required external dependencies that are not Apache projects are all Apache licenses or
other compatible Licenses:

 * Maven & Maven plugins (http://maven.apache.org/) [Apache 2.0]
 * JDK7 or OpenJDK 7 (http://java.com/) [Oracle or Openjdk JDK License]
 * Google Guava v11.0.2 (https://github.com/google/guava) [Apache 2.0]
 * Google Guice v3.0 (https://github.com/google/guice/wiki) [Apache 2.0]
 * Testng v6.8.8  (http://testng.org) [Apache 2.0] 
 * SLF4J (http://www.slf4j.org/) v1.7.7 [MIT License]
 * Netty (http://netty.io) v3.2.6.Final [Apache 2.0]
 * Google Protocol Buffers v2.5.0 (https://developers.google.com/protocol-buffers/) [BSD License]
 * Mockito (http://mockito.org/) v1.9.5 [MIT License]
 * LMAX Disruptor v3.2.0 (https://lmax-exchange.github.io/disruptor/) [Apache 2.0]
 * Coda Hale/Yammer.com Dropwizard Metrics v3.0.1 (http://metrics.dropwizard.io/3.1.0/) [Apache
2.0]
 * C.Beust, JCommander v1.35 (http://jcommander.org/) [Apache 2.0]
 * Hamcrest v1.3 (http://hamcrest.org/JavaHamcrest/) [BSD License]

= Cryptography =
Omid project does not use cryptography itself. However, Apache HBase -the datastore on top
of which Omid works in its current version- uses standard APIs and tools for SSH and SSL communication
where necessary.

= Required Resources =
We request that following resources be created for the project to use:

== Mailing lists ==
 * omid-private (moderated subscriptions)
 * omid-commits (commit notification)
 * omid-dev (technical discussions)

== Git repository ==
https://github.com/apache/incubator-omid

== Documentation ==
https://omid.incubator.apache.org/docs/

== JIRA instance ==
https://issues.apache.org/jira/browse/omid

= Initial Committers =
 * Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)
 * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
 * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
 * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
 * Igor Katkov (katkovi<AT>yahoo-inc<DOT>com)
 * Francis C. Liu (fcliu<AT>yahoo-inc<DOT>com)
 * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
 * Francisco Perez-Sorrosal (fperez<AT>yahoo-inc<DOT>com)
 * Sameer Paranjpye (sparanjpye<AT>yahoo<DOT>com)
 * Ohad Shacham (ohads<AT>yahoo-inc<DOT>com)
 * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)

= Additional Interested Contributors =
 * Ivan Kelly (ivank<AT>apache<DOT>org)
 * Maysam Yabandeh (myabandeh<AT>dropbox<DOT>com)

= Affiliations =
 * Edward Bortnikov, Yahoo Inc.
 * Daniel Dai, Hortonworks
 * Flavio P. Junqueira, Confluent
 * Igor Katkov, Yahoo Inc.
 * Ivan Kelly, Midokura
 * Francis C. Liu, Yahoo Inc.
 * Sameer Paranjpye, Arimo
 * Francisco Perez-Sorrosal, Yahoo Inc.
 * Ohad Shacham, Yahoo Inc.
 * Maysam Yabandeh, Dropbox Inc.

= Sponsors =


== Champion ==
Daniel Dai, Hortonworks (daijy<AT>hortonworks<DOT>com)

== Nominated Mentors ==
 * Alan Gates, Hortonworks, (gates<AT>hortonworks<DOT>com)
 * Lars Hofhansl, Salesforce (larsh<AT>apache<DOT>org)
 * Flavio P. Junqueira, Confluent (fpj<AT>apache<DOT>org)
 * Thejas Nair, Hortonworks (thejas<AT>hortonworks<DOT>com)
 * James Taylor, Salesforce (jamestaylor<AT>apache<DOT>org>)

== Sponsoring Entity ==
Apache Incubator PMC

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message