incubator-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Incubator Wiki] Update of "S2GraphProposal" by HyunsikChoi
Date Fri, 20 Nov 2015 08:18:59 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change notification.

The "S2GraphProposal" page has been changed by HyunsikChoi:
https://wiki.apache.org/incubator/S2GraphProposal?action=diff&rev1=16&rev2=17

Comment:
I reflected comments from Stack.

  = S2Graph Proposal =
  
  == Abstract ==
- S2Graph is a distributed and scalable OLTP graph database built on HBase to support fast
traversal on extremely large graph.
+ S2Graph is a distributed and scalable OLTP graph database built on Apache HBase to support
fast traversal of extremely large graphs.
  
  Here are additional materials to introduce S2Graph.
   * HBaseCon 2015 - http://www.slideshare.net/HBaseCon/use-cases-session-5
   * Apache: Big Data 2015 - http://schd.ws/hosted_files/apachebigdata2015/06/s2graph_apache_con.pdf
  
  == Proposal ==
- S2Graph is to provide a scalable distributed graph database engine over key/value storage
such as HBase. S2Graph provide fully ashynchronous API to manupulate data as property graph
model and fast breadth-first-search query on graph.
+ S2Graph provides a scalable distributed graph database engine over a key/value store such
as HBase. S2Graph provides a fully asynchronous API to manipulate data as a property graph
model and fast breadth-first-search queries over the graph.
  
  == Background ==
- S2Graph initially started as an internal project at Kakao.com to efficiently store user
relation and user activities as one large graph and provide unified query to traverse graph.
It was open sourced on Github about a 3 months ago in June 2015.
+ S2Graph initially started as an internal project at Kakao.com to efficiently store user
relations and user activities as one large graph and to provide a unified query interface
to traverse the graph. It was open sourced on Github about a 3 months ago in June 2015.
  
- Over time, S2Graph together with HBase as storage tier has begun to be adapted into various
applications, such as messaging, social feeds, and realtime recommendations at Kakao.
+ Over time, S2Graph using HBase as the storage tier has begun by adapted into various applications,
such as messaging, social feeds, and realtime recommendations at Kakao.
  
- Users can benefit from S2Graph`s generalized high level API instead of low-level key/value
API for graph abstraction, just like Phoenix provide SQL layer over HBase.
+ Users can benefit by using S2Graph`s generalized high level graph abstraction API instead
of querying via low-level key/value APIs, just as Apache Phoenix provides a SQL layer over
HBase.
  
  == Rationale ==
- Graph data (highly interconnected data) is very abundant and important these days. When
users have a multitude of relationships, each with complex properties associated with them,
graph model is more intuitive and efficient than tabular format(RDBMS).
+ Graph data (highly interconnected data) is very abundant and important these days. When
users have a multitude of relationships, each with complex properties associated with them,
a graph model is more intuitive and efficient than tabular formats (RDBMS).
   
- There are many ASF projects that provide SQL layer, but there is no ASF projects that provide
scalable graph layer on existing hadoop echo system. When graph data grows to trillion edge
scale, the process of traversing takes a long time and costly. However, with the benefit of
HBase`s scalable architecture, S2Graph can traverse large graph in a breadth-first-search
manner efficiently.
+ There are many ASF projects that provide SQL tiers, but there is no ASF projects that provide
a scalable graph layer on top of the existing hadoop ecosystem. When graph data grows to the
trillion edge scale, the process of traversing takes a long time and can be costly. However,
with the benefit of HBase`s scalable architecture, S2Graph can traverse large graphs in a
breadth-first-search manner efficiently.
  
- S2Graph also interoperates with several existing Apache projects (HBase, Spark) to provide
a way to merge real time events and batch processed data using property graph data model.
+ S2Graph also interoperates with several existing Apache projects (HBase, Apache Spark) to
provide means of merging real time events and batch processed data using the property graph
data model.
  
- Many developers are running their own domain specific API servers to serve their data products,
but graph model is general and S2Graph API fully support traverse on graph, so it can be used
as scalable general purpose API serving layer for various domains. As long as data can be
modeled as graph, then users can avoid tedious work for developing customized API servers
by using S2Graph.
+ Many developers run their own domain specific API servers to serve their data products,
but a graph model is general and the S2Graph API fully supports traversal of the graph, so
it can be used as a scalable general purpose API serving layer for various domains. As long
as data can be modeled as graph, then users can avoid tedious work developing customized API
servers if they use S2Graph.
  
  == Initial Goals ==
  The initial goals will be to move the existing codebase to Apache and integrate with the
Apache development process. Once this is accomplished, we plan for incremental development
and releases that follow the Apache guidelines.
@@ -43, +43 @@

  S2Graph is currently being designed and developed by 2 engineers from Kakao. - Doyung Yoon,
Deawon Jeong.
  
  === Alignment ===
- Our proposed S2Graph effort aligns closely with Apache HBase. The HBase project perimeter
is denoted by a simple byte-array based Create, Read, Update, Delete and Scan APIs with no
current plans to extend beyond this bounds.
+ Our proposed S2Graph effort aligns closely with Apache HBase. The HBase project perimeter
is denoted by a simple byte-array based Create, Read, Update, Delete and Scan API with no
current plans to extend beyond these bounds.
  
- S2Graph complements this with a higher level API for property graph model.
+ S2Graph complements this with a higher level API for a property graph model.
  
- S2Graph was designed to offer scalable distributed graph database skin over HBase from the
beginning in order to provide property graph model and breadth first search, and continue
to focus on providing graph model.
+ S2Graph was designed to offer a scalable distributed graph database skin over HBase from
the beginning in order to provide a property graph model and breadth first search, and will
continue to focus on providing the graph model.
  
  == Known Risks ==
  === Orphaned Products ===
  The core developers of S2Graph team plan to work full time on this project. There is very
little risk of S2Graph getting orphaned since at least one large company (Kakao) is extensively
using it in their production HBase clusters. For example, currently there are 20+ use cases
with more than 1+Trillion edges and 140 million breadth first search query requests per minute
using S2Graph in production. We plan to extend and diversify this community further through
Apache.
  
  === Inexperience with Open Source ===
- The core developers are all active users and followers of open source. They are already
committers and contributors to the S2Graph Github project. All have been involved with the
source code that has been released under an open source license. Though the core set of Developers
do not have Apache Open Source experience, there are plans to onboard individuals with Apache
open source experience on to the project.
+ The core developers are all active users and followers of open source. They are already
committers and contributors to the S2Graph Github project. All have been involved with the
source code that has been released under an open source license. Though the core set of Developers
do not have Apache Open Source experience, there are plans to onboard individuals with Apache
open source experience to the project.
  
  === Homogenous Developers ===
  Most committers in this proposal belong to the same institution (Kakao). The engagement
of these committers goes well beyond the necessary development to support research, and all
committers work on S2Graph full time. Several people from other institutions are working on
and are familiar with the S2Graph codebase. We will work to attract them as future committers
during the incubation phase, following a merit-based approach.
  
  === Reliance on Salaried Developers ===
- Kakao invested in S2Graph as the distributed graph database solution on top of HBase and
some of its key engineers are working full time on the project. We look forward to other Apache
developers and researchers to contribute to the project. Also key to addressing the risk associated
with relying on Salaried developers from a single entity is to increase the diversity of the
contributors and actively lobby for Domain experts in the graph database space to contribute.
Apache S2Graph intends to do this.
+ Kakao invested in S2Graph as the distributed graph database solution on top of HBase and
some of its key engineers are working full time on the project. We look forward to other Apache
developers and researchers contributing to the project. Also key to addressing the risk associated
with relying on Salaried developers from a single entity is to increase the diversity of the
contributors and actively lobby for Domain experts in the graph database space to contribute.
Apache S2Graph intends to do this.
  
  === Relationships with Other Apache Products ===
- S2Graph has a strong relationship and dependency with Apache Hadoop HBase and Spark. Being
part of Apache’s Incubation community, could help with a closer collaboration among these
two projects and as well as others.
+ S2Graph has a strong relationship and dependency with Apache HBase and Apache Spark. Being
part of Apache’s Incubation community, could help with a closer collaboration among these
two projects and as well as others.
  
  In terms of graph processing frameworks, S2Graph and Apache Giraph look similar. However,
their goals are apparently different to each other. Giraph aims at analytical batch processing
on immutable graph data sets. In contrast, S2Graph is designed for OLTP-like workloads on
graph data sets, and S2Graph provides INSERT/UPDATE operations too.
  
  
  === An Excessive Fascination with the Apache Brand ===
- S2Graph is proposing to enter incubation at Apache in order to help efforts to diversify
the committer-base, not so much to capitalize on the Apache brand. The S2Graph project is
in production use already inside Kakao, but is not expected to be an Kakao product for external
customers. As such, the S2Graph project is not seeking to use the Apache brand as a marketing
tool.
+ S2Graph is proposing to enter incubation at Apache in order to help efforts to diversify
the committer-base, not so much to capitalize on the Apache brand. The S2Graph project is
in production use already inside Kakao, but is not expected to be a Kakao product for external
customers. As such, the S2Graph project is not seeking to use the Apache brand as a marketing
tool.
  
  == Documentation ==
  Information about S2Graph can be found at https://github.com/kakao/s2graph. The following
links provide more information about S2Graph in open source:
@@ -108, +108 @@

  
  === Source control ===
  
- The S2Graph team would like to use Git for source control, due to our current use of Git.
We request a writeable Git repo for S2Graph, and mirroring to be set up to Github through
INFRA.
+ The S2Graph team would like to use Git for source code control, due to our current use of
Git. We request a writeable Git repo for S2Graph, and mirroring to be set up to Github through
INFRA.
  
  === Issue Tracking ===
  

---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org


Mime
View raw message