incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject [DISCUSS] Olympian Incubation Proposal
Date Thu, 29 Sep 2016 04:01:50 GMT
Hi All,

Please find below a proposal for a new incubator podling, Apache Olympian,
formerly Titan.
Apache Olympian is software designed to support the processing of graphs so
large that they require storage and computational capacities beyond what a
single machine can provide.

This project will be a fork of Titan graph database project (
https://github.com/thinkaurelius/titan/) that already come with Apache
License v2.0.
The project was created by company called Aurelius and was acquired by
Datstax.
Coming to 2016 there has been less activity in the project as the original
authors are busy with other software development, but there is significant
interest from the community (see https://groups.google.com/forum/#!msg/
aureliusgraphs/jEN_7QwVXZ4/mz3gik-FAgAJ)

The community have tried to reaching out to Datastax to donate the
copyright and trademark of project to ASF but it was not approved.
Because of that, the community has decided to go to ASF with different
name: Apache Olympian.

The wiki proposal page is located at this URL:

  https://wiki.apache.org/incubator/OlympianProposal

I have also included the current text of that page below.

Looking forward of comments or questions about this proposal.


Thanks,
Henry Saputra
On behalf of Apache Olympian community


= Apache Olympian Proposal ==

== Abstract ==

Olympian (formerly Titan) is software designed to support the processing of
graphs so large that they require storage and computational capacities
beyond what a single machine can provide. Scaling graph data processing for
real time traversals and analytical queries is Olympian’s main benefit.

== Proposal ==

Olympian consists of about 75K of Java code under the Apache 2 license
<http://www.apache.org/licenses/LICENSE-2.0>. It supports very large
graphs, with many concurrent transactions and operational graph processing.
Olympian graphs scale with the number of machines in the cluster. Olympian
already integrates with a number of Apache projects:

   -

   Provides native support for the popular property graph data model
   exposed by Apache TinkerPop <http://tinkerpop.apache.org/>.
   -

   Provides native support for the Gremlin graph traversal language defined
   by Apache TinkerPop for programming language agnostic connectivity.
   -

   Provides graph persistence solutions with:
   -

      Apache Cassandra <http://cassandra.apache.org/>
      -

      Apache HBase <https://hbase.apache.org/>
      -

   Provides advanced indexing with:
   -

      Apache Lucene <https://lucene.apache.org/>
      -

      Apache Solr <http://lucene.apache.org/solr/>
      -

   Supports global graph analytics and batch graph processing through
the Apache
   Hadoop <http://hadoop.apache.org/> framework with processors implemented
   with:
   -

      Apache Spark <http://spark.apache.org/>
      -

      Apache Giraph <http://giraph.apache.org/>


Other software Olympian interfaces with includes:

   -

   BerkeleyDB
   -

   Elasticsearch


== Background ==

Marko Rodriguez and Matthias Broecheler, cofounders of the Aurelius graph
consulting firm, developed the Titan distributed graph database system and
made it available under the Apache 2 license in 2012. Marko is also a
cofounder of the Apache TinkerPop project and the primary developer of the
Gremlin graph traversal language. Other developers of Titan include Dan
LaRocque, Stephen Mallette, Daniel Kuppitz, and Pavel Yaskevich. Datastax
acquired Aurelius in February 2015, prior to the Titan 1.0 release in
September 2015.

Since Titan became available on GitHub, there have been 4434 commits, 38
branches, 23 releases, and 35 contributors.  In 2016 there has been less
activity as the original authors are busy with other software development,
but there is significant interest from the community.

== Rationale ==

(1) There are a number of Apache projects that integrate with Titan.

(2) Apache Atlas (incubating) <http://atlas.incubator.apache.org/> packages
and ships Titan as an essential component, yet Titan is not part of Apache.

(3) There are a number of existing users of Titan who are keen to continue
to develop the code. These users provide the basis of the community for the
proposed project.

== Initial Goals ==

The initial goals are as follows:

   -

   Establish the project governance in The Apache Way and broaden the
   community.
   -

   Distribute an incubating release aligned with the latest Apache
   TinkerPop version and prepared in accordance with the Apache release
   process.
   -

   Improve the documentation.
   -

   Add more unit/scenario tests.
   -

   Contribute functional and performance-related enhancements to the code.


== Current Status ==

The project will be forked off the existing Titan code base. This code has
been available under the Apache 2 License but has not been subject to the
Apache governance. The proposed project will adhere to Apache’s governance
and processes. This is one of the key benefits and reasons for bringing the
project forward as an incubator candidate.

There are 37 pull requests currently open against Titan, and the last pull
request was merged in June 2016. During incubation, the community will
adopt a voting-based approach to review and commit those changes into the
code base in preparation for the first incubating release.

=== Meritocracy ===

The proposed project will adopt the familiar process of progression from
submitter to contributor to PMC. The community includes active committers
and PMC members on other Apache projects (e.g. Apache TinkerPop, Apache
Atlas (incubating), Apache HBase).

=== Community ===

There is an active and passionate community of existing Titan users. It is
believed that this community will continue to grow and to progress. Titan
is well-designed to support different backends, and the community will
naturally grow as more backends are written to fit into the Titan
architecture. Since the Titan 1.0 release, 3 different storage providers
have become available. Also once an incubation release is made available,
the community will likely see quick adoption from the Apache TinkerPop user
base.

=== Core Developers ===

The community includes developers from a number of vendors (e.g. Google,
HortonWorks, IBM, Mindmaps, Classmethod) and users (both academic and
commercial). It contains two active committers and PMC members from the
Apache TinkerPop project, one active committer and PPMC member from Apache
Atlas (incubating), and one committer from Apache HBase. The developers
represent a good mixture of skills, including expertise with each of the
supported providers.

=== Alignment ===

The proposed project will be used by or integrated with a number of other
Apache components, including (probably) TinkerPop, Atlas, Hadoop, Spark,
Cassandra, and HBase. It is logical that the project should also be homed
within Apache and subject to the governance principles of Apache.

== Known Risks ==

=== Orphaned products ===

All the companies and developers associated with academic institutions who
are engaged or want to be engaged with Titan are well aware of the open
source philosophy and the importance of open governance of open source
products. Hence, we think the risks of Titan being orphaned are minimal.

=== Inexperience with Open Source ===

The project is based on an existing open source code base (Titan 1.0) and
the community consists of developers and vendors who have a history and
strategy of open development and governance. The initial committers include
committers and PMC members from other Apache projects.

=== Homogenous Developers ===

The community consists of geographically-dispersed volunteers from academic
and a range of commercial organisations. The geographic diversity includes
North America, Europe, Asia, and Australia.

=== Reliance on Salaried Developers ===

Many of the developers are salaried by the vendors in the community, but
the vendors have publicly stated their support for open systems and whilst
we might expect to see some gradual replacement of members of the
community, we believe that it will remain stable and viable into the
future. All members of the community are passionate about the project and
are likely to contribute outside of ‘normal working hours’.

=== Relationships with Other Apache Products ===

The proposed project has dependencies on other Apache projects, including
Cassandra and HBase, for example. There are Apache projects that depend
upon the availability of an open, scalable graph database. Apache Atlas is
an example of such a project. Apache S2Graph (incubating)
<https://s2graph.incubator.apache.org/> is currently an incubator project
at Apache, however it does not currently implement the Apache TinkerPop
interfaces, although it has an open JIRA for that effort.

=== An Excessive Fascination with the Apache Brand ===

Whilst the Apache brand will help to attract developers and consumers to
the project, it is not for this reason that the proposal is being made. It
is to align the governance of the project with that of the other components
with which it is commonly used and to benefit from the development
principles adopted by Apache. In particular, TinkerPop is Titan’s most
critical component/dependency, one so tight that Titan releases are
contemporaneous or follow TinkerPop releases.

== Documentation ==

Information on the existing Titan code base can be found at:
http://titan.thinkaurelius.com/

== Initial Source ==

The initial source will be based off a fork of the Titan code base. The
latter can be found at: https://github.com/thinkaurelius/titan. The fork to
be used as the base is from: https://github.com/pluradj/titan

== Source and Intellectual Property Submission Plan ==

Since Datastax owns the copyright and trademark for Titan, when the
proposal is accepted to the ASF Incubator, the community will choose a
different name. It is proposed that Titan will enter incubation with the
name Olympian. The community will finalize and document the name research
during incubation. Individuals in the community have discussed the
possibility of a software grant from Datastax, but Datastax was not
interested in donating code or brand to the ASF. When asked if they would
block others taking it to Apache they did not respond.

== External Dependencies ==

Titan has the following external dependencies:

* Java 1.8

* Apache Maven 3.0.5 (Apache 2.0 License)

* JUnit 4.12 (EPL)

* MRUnit 1.1.0 (Apache 2.0 License)

* Apache Cassandra (Apache 2.0 License)

* Jamm (Apache 2.0 License)

* Metrics 2.1.1 and 3.0.1 (Apache 2.0 License)

* Sesame 2.7.10 (Eclipse Public License Version 1.0)

* slf4j 1.7.5 (MIT)

* Apache HTTPComponents 4.4.1 (Apache 2.0 License)

* Apache Hadoop 1.2.1 & 2.7.1 (Apache 2.0 License)

* Apache HBase (Apache 2.0 License)

* Jackson 1.9.2 & 2.4.4 (Apache 2.0 License)

* Apache Lucene 4.10.4 (Apache 2.0 License)

* Elasticsearch 1.5.1 (Apache 2.0 License)

* Apache Commons Beanutils 1.7.0 (Apache 2.0 License)

* Joda Time 1.6.2 (Apache 2.0 License)

* Google ConcurrentLinkedHashMap (Apache 2.0 License)

* Antlr 2.7.7 And 3.2 (BSD License)

* ASM 3 & 4 (http://asm.ow2.org/license.html)

* Apache Zookeeper 3.4.6 (Apache 2.0 License)

* Jersey 1.9 (CDDL 1.1 and GPL v2)

* JNA 4.0.0 (LGPL 2.1 and Apache 2.0 License)

* Kuali Maven s3 Wagon 1.1.20 (Educational Community License, Version 2.0)

* Apache Tomcat Jasper 5.5.23 (Apache 2.0 License)

* Berkeley DB 5.0.73 (Sleepycat License)

Upon acceptance to the incubator, we would begin a thorough analysis of all
transitive dependencies to verify this information and introduce license
checking into the build and release process by integrating with Apache
Rat.  In the case where a dependency has an Apache incompatible license,
such as Berkeley DB, we will remove or replace it with an appropriate
alternative.

== Cryptography ==

Titan will support encryption of client-server communication through its
use of the Apache TinkerPop Gremlin Server.  We do not expect Titan to be a
controlled export due to its use of encryption.

== Required resources ==

=== Mailing lists ===

* private@olympian.incubator.apache.org  (with moderated subscriptions)

* commits@olympian.incubator.apache.org

* dev@olympian.incubator.apache.org

* user@olympian.incubator.apache.org

=== Git Repository ===

The team would like to use git for source control. We request a writable
git repo https://git-wip-us.apache.org/repos/asf/incubator-olympian.git,
and mirroring to be set up to GitHub through INFRA. We also request
configuration for continuous integration with Travis CI.

=== Issue Tracking ===

Titan currently uses the GitHub issue tracker and the team would like to
migrate all of these issues to the Apache JIRA.

== Initial Committers ==

Dylan Bethune-Waddell - dylan.bethune.waddell@mail.utoronto.ca

Mathias Bogaert - mathias.bogaert@gmail.com

Misha Brukman - mbrukman@google.com

Felix Chapman - felix@mindmaps.io

Sheldon Hall - sheldon@mindmaps.io

Jing Chen (Jerry) He - jerryjch@apache.org

Madhan Neethiraj - mneethiraj@hortonworks.com

Alexander Patrikalakis - amcp@me.com

Jason Plurad - pluradj@apache.org

Suma Shivaprasad - sumasai@apache.org <sumasa@apache.org>

Lindsay Smith - lindsaysmith@google.com

Filipe Teixeira - fppintoteixeira@gmail.com

Ted Wilmes - twilmes@apache.org

== Affiliations ==

Dylan Bethune-Waddell - Jurisica Lab, Princess Margaret Cancer Centre, UHN

Mathias Bogaert - Independent Contractor

Misha Brukman - Google

Felix Chapman - Mindmaps

Sheldon Hall - Mindmaps

Jing Chen (Jerry) He - IBM

Madhan Neethiraj - HortonWorks

Alexander Patrikalakis - Classmethod, Inc.

Jason Plurad - IBM

Suma Shivaprasad - HortonWorks

Lindsay Smith - Google

Filipe Teixeira - Mindmaps

Ted Wilmes - Expero Inc.

== Sponsors ==

=== Champion ===

Henry Saputra - hsaputra@apache.org

=== Nominated Mentors ===

Alan Gates - gates@apache.org

P. Taylor Goetz - ptgoetz@apache.org

Henry Saputra - hsaputra@apache.org

Michael Stack - stack@apache.org

=== Sponsoring Entity ===

The Apache Incubator

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message