incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Homan <jgho...@gmail.com>
Subject Re: [VOTE] Accept S2Graph into Apache Incubation
Date Tue, 24 Nov 2015 19:33:22 GMT
+1 (binding)

On 24 November 2015 at 09:55, Julien Le Dem <julien@dremio.com> wrote:
> +1 (binding)
>
> On Tue, Nov 24, 2015 at 9:48 AM, Stack <stack@duboce.net> wrote:
>
>> +1 (binding)
>>
>> On Mon, Nov 23, 2015 at 4:53 PM, Hyunsik Choi <hyunsik@apache.org> wrote:
>>
>> > Hello folks,
>> >
>> > Thanks for all the feedback on the S2Graph Proposal.
>> >
>> > I would like to call for a [VOTE] on S2Graph joining the ASF as an
>> > incubation project.
>> >
>> > The vote is open for at least 72 hours:
>> >
>> > [ ] +1 accept S2Graph in the Incubator
>> > [ ] ±0
>> > [ ] -1 (please give reason)
>> >
>> > S2Graph provides a scalable distributed graph database engine over a
>> > key/value store such as HBase. S2Graph provides a fully asynchronous
>> > API to manipulate data as a property graph model and fast
>> > breadth-first-search queries over the graph. S2Graph is designed for
>> > OLTP-like workloads on graph data sets instead of batch processing,
>> > and it also provides INSERT/UPDATE operations on them.
>> >
>> > The proposal is available on the wiki here:
>> > https://wiki.apache.org/incubator/S2GraphProposal
>> >
>> > Best regards,
>> > Hyunsik
>> >
>> >
>> > <COPY of the proposal wiki>
>> >
>> >
>> ------------------------------------------------------------------------------------------------
>> > = S2Graph Proposal =
>> >
>> > == Abstract ==
>> > S2Graph is a distributed and scalable OLTP graph database built on
>> > Apache HBase to support fast traversal of extremely large graphs.
>> >
>> > == Proposal ==
>> > S2Graph provides a scalable distributed graph database engine over a
>> > key/value store such as HBase. S2Graph provides a fully asynchronous
>> > API to manipulate data as a property graph model and fast
>> > breadth-first-search queries over the graph. S2Graph is designed for
>> > OLTP-like workloads on graph data sets instead of batch processing.
>> > Also, S2Graph provides INSERT/UPDATE operations. Its name 'S2Graph' is
>> > an abbreviated word of '''S'''uper '''S'''imple '''Graph''' Database.
>> >
>> > Here are additional materials to introduce S2Graph.
>> >  * HBaseCon 2015 -
>> http://www.slideshare.net/HBaseCon/use-cases-session-5
>> >  * Apache: Big Data 2015 -
>> > http://schd.ws/hosted_files/apachebigdata2015/06/s2graph_apache_con.pdf
>> >
>> > == Background ==
>> > S2Graph initially started as an internal project at Kakao.com to
>> > efficiently store user relations and user activities as one large
>> > graph and to provide a unified query interface to traverse the graph.
>> > It was open sourced on Github about a 3 months ago in June 2015.
>> >
>> > Over time, S2Graph using HBase as the storage tier has begun by
>> > adapted into various applications, such as messaging, social feeds,
>> > and realtime recommendations at Kakao.
>> >
>> > Users can benefit by using S2Graph`s generalized high level graph
>> > abstraction API instead of querying via low-level key/value APIs, just
>> > as Apache Phoenix provides a SQL layer over HBase.
>> >
>> > == Rationale ==
>> > Graph data (highly interconnected data) is very abundant and important
>> > these days. When users have a multitude of relationships, each with
>> > complex properties associated with them, a graph model is more
>> > intuitive and efficient than tabular formats (RDBMS).
>> >
>> > There are many ASF projects that provide SQL tiers, but there is no
>> > ASF projects that provide a scalable graph layer on top of the
>> > existing hadoop ecosystem. When graph data grows to the trillion edge
>> > scale, the process of traversing takes a long time and can be costly.
>> > However, with the benefit of HBase`s scalable architecture, S2Graph
>> > can traverse large graphs in a breadth-first-search manner
>> > efficiently.
>> >
>> > S2Graph also interoperates with several existing Apache projects
>> > (HBase, Apache Spark) to provide means of merging real time events and
>> > batch processed data using the property graph data model.
>> >
>> > Many developers run their own domain specific API servers to serve
>> > their data products, but a graph model is general and the S2Graph API
>> > fully supports traversal of the graph, so it can be used as a scalable
>> > general purpose API serving layer for various domains. As long as data
>> > can be modeled as graph, then users can avoid tedious work developing
>> > customized API servers if they use S2Graph.
>> >
>> > == Initial Goals ==
>> > The initial goals will be to move the existing codebase to Apache and
>> > integrate with the Apache development process. Once this is
>> > accomplished, we plan for incremental development and releases that
>> > follow the Apache guidelines.
>> >
>> > == Current Status ==
>> >
>> > === Meritocracy ===
>> > S2Graph operated on meritocratic principles from the get go.
>> > Currently, all the discussions pertaining to S2Graph development are
>> > public on Github. The current incubation proposal includes the major
>> > code contributors to S2Graph. Several additional people have worked on
>> > the S2graph codebase for industry use cases and would be interested in
>> > becoming committers. We are starting with a small committer group and
>> > we plan to add additional committers following an open merit-based
>> > decision process during the incubation phase.
>> >
>> > === Community ===
>> > We have already begun building a community but at this time the
>> > community consists only of S2Graph developers – all Kakao employees –
>> > and prospective users. S2Graph seeks to develop developer and user
>> > communities during incubation.
>> >
>> > === Core Developers ===
>> > S2Graph is currently being designed and developed by 2 engineers from
>> > Kakao. - Doyung Yoon, Deawon Jeong.
>> >
>> > === Alignment ===
>> > Our proposed S2Graph effort aligns closely with Apache HBase. The
>> > HBase project perimeter is denoted by a simple byte-array based
>> > Create, Read, Update, Delete and Scan API with no current plans to
>> > extend beyond these bounds.
>> >
>> > S2Graph complements this with a higher level API for a property graph
>> > model.
>> >
>> > S2Graph was designed to offer a scalable distributed graph database
>> > skin over HBase from the beginning in order to provide a property
>> > graph model and breadth first search, and will continue to focus on
>> > providing the graph model.
>> >
>> > == Known Risks ==
>> > === Orphaned Products ===
>> > The core developers of S2Graph team plan to work full time on this
>> > project. There is very little risk of S2Graph getting orphaned since
>> > at least one large company (Kakao) is extensively using it in their
>> > production HBase clusters. For example, currently there are 20+ use
>> > cases with more than 1+Trillion edges and 140 million breadth first
>> > search query requests per minute using S2Graph in production. We plan
>> > to extend and diversify this community further through Apache.
>> >
>> > === Inexperience with Open Source ===
>> > The core developers are all active users and followers of open source.
>> > They are already committers and contributors to the S2Graph Github
>> > project. All have been involved with the source code that has been
>> > released under an open source license. Though the core set of
>> > Developers do not have Apache Open Source experience, there are plans
>> > to onboard individuals with Apache open source experience to the
>> > project.
>> >
>> > === Homogenous Developers ===
>> > Most committers in this proposal belong to the same institution
>> > (Kakao). The engagement of these committers goes well beyond the
>> > necessary development to support research, and all committers work on
>> > S2Graph full time. Several people from other institutions are working
>> > on and are familiar with the S2Graph codebase. We will work to attract
>> > them as future committers during the incubation phase, following a
>> > merit-based approach.
>> >
>> > === Reliance on Salaried Developers ===
>> > Kakao invested in S2Graph as the distributed graph database solution
>> > on top of HBase and some of its key engineers are working full time on
>> > the project. We look forward to other Apache developers and
>> > researchers contributing to the project. Also key to addressing the
>> > risk associated with relying on Salaried developers from a single
>> > entity is to increase the diversity of the contributors and actively
>> > lobby for Domain experts in the graph database space to contribute.
>> > Apache S2Graph intends to do this.
>> >
>> > === Relationships with Other Apache Products ===
>> > S2Graph has a strong relationship and dependency with Apache HBase and
>> > Apache Spark. Being part of Apache’s Incubation community, could help
>> > with a closer collaboration among these two projects and as well as
>> > others.
>> >
>> > In terms of graph processing frameworks, S2Graph and Apache Giraph
>> > look similar. However, their goals are apparently different to each
>> > other. Giraph aims at analytical batch processing on immutable graph
>> > data sets. In contrast, S2Graph is designed for OLTP-like workloads on
>> > graph data sets, and S2Graph provides INSERT/UPDATE operations too.
>> >
>> >
>> > === An Excessive Fascination with the Apache Brand ===
>> > S2Graph is proposing to enter incubation at Apache in order to help
>> > efforts to diversify the committer-base, not so much to capitalize on
>> > the Apache brand. The S2Graph project is in production use already
>> > inside Kakao, but is not expected to be a Kakao product for external
>> > customers. As such, the S2Graph project is not seeking to use the
>> > Apache brand as a marketing tool.
>> >
>> > == Documentation ==
>> > Information about S2Graph can be found at
>> > https://github.com/kakao/s2graph. The following links provide more
>> > information about S2Graph in open source:
>> >  * S2Graph web site: https://steamshon.gitbooks.io/s2graph-book/content/
>> >  * Codebase at Github: https://github.com/kakao/s2graph
>> >  * Issue Tracking: https://github.com/kakao/s2graph/issues
>> >  * User community: https://groups.google.com/forum/#!forum/s2graph
>> >
>> > == Initial Source ==
>> >
>> > The S2Graph codebase is currently hosted on Github:
>> > https://github.com/kakao/s2graph.
>> >
>> > === Source and Intellectual Property Submission Plan ===
>> >
>> > Currently, the S2Graph codebase is distributed under the Apache 2.0
>> > License.
>> >
>> > == External Dependencies ==
>> >
>> > Beyond relying on Apache HBase, S2Graph has the following external
>> > dependencies:
>> >  * Asynchbase (BSD)
>> >  * Play Framework (Apache 2.0 license)
>> >  * Scala (http://www.scala-lang.org/license.html)
>> >  * Spark (Apache 2.0 license)
>> >  * Kafka (Apache 2.0 license)
>> >
>> > == Required Resources ==
>> >
>> > === Mailing list ===
>> >
>> > We will migrate our mailing lists to the following:
>> >  * users@s2graph.incubator.apache.org
>> >  * dev@s2graph.incubator.apache.org
>> >  * private@s2graph.incubator.apache.org
>> >  * commits@s2graph.incubator.apache.org
>> >
>> > === Source control ===
>> >
>> > The S2Graph team would like to use Git for source code control, due to
>> > our current use of Git. We request a writeable Git repo for S2Graph,
>> > and mirroring to be set up to Github through INFRA.
>> >
>> > === Issue Tracking ===
>> >
>> > S2Graph currently uses the github issue tracking system associated
>> > with its github repo (https://github.com/kakao/s2graph/issues). We
>> > will migrate to the Apache JIRA
>> > (http://issues.apache.org/jira/browse/S2Graph).
>> >
>> > === Other Resources ===
>> >
>> >  * Jenkins/Hudson for builds and test running.
>> >  * Wiki for documentation purposes.
>> >  * Blog to improve project dissemination.
>> >
>> > == Initial Committers ==
>> >
>> >  * Doyung Yoon <shom83 at gmail dot com>
>> >  * Daewon Jeong <blueiur at gmail dot com>
>> >  * Jaesang Kim <honeysleep at gmail dot com>
>> >  * Hwansung Yu <deejayfwan at gmail dot com>
>> >  * Min-Seok Kim <mskim.org at gmail dot com>
>> >  * Chul Kang <miralchul at gmail dot com>
>> >  * Luke Han <lukehan at apache dot org>
>> >  * Alexander Bezzubov <bzz at apache dot org>
>> >
>> > == Affiliations ==
>> >
>> >  * Doyung Yoon, Kakao
>> >  * Daewon Jeong, Kakao
>> >  * Jaesang Kim, Kakao
>> >  * Hwansung Yu, Kakao
>> >  * Min-Seok Kim, Kakao
>> >  * Chul Kang, Kakao,
>> >  * Luke Han, Ebay Inc.
>> >  * Alexander Bezzubov, NFLabs
>> >
>> > == Sponsors ==
>> >
>> > === Champion ===
>> > Hyunsik Choi
>> >
>> > === Nominated Mentors ===
>> >  * Andrew Purtell - Apache Member, Salesforce
>> >  * Sergio Fernández - Apache Member, Redlink
>> >  * Hyunsik Choi - Apache Member, Gruter Inc.
>> >  * Seetharam Venkatesh - IPMC, Hortonworks Inc.
>> >
>> > === Sponsoring Entity ===
>> >
>> >  * The Apache Incubator
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > For additional commands, e-mail: general-help@incubator.apache.org
>> >
>> >
>>
>
>
>
> --
> Julien

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message