Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DFEB118F11 for ; Tue, 24 Nov 2015 09:48:34 +0000 (UTC) Received: (qmail 48493 invoked by uid 500); 24 Nov 2015 09:48:33 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 48306 invoked by uid 500); 24 Nov 2015 09:48:33 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 48292 invoked by uid 99); 24 Nov 2015 09:48:33 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Nov 2015 09:48:33 +0000 Received: from mail-lf0-f52.google.com (mail-lf0-f52.google.com [209.85.215.52]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id A62C01A0015 for ; Tue, 24 Nov 2015 09:48:32 +0000 (UTC) Received: by lfaz4 with SMTP id z4so13465781lfa.0 for ; Tue, 24 Nov 2015 01:48:30 -0800 (PST) X-Gm-Message-State: ALoCoQmBj0mh1Y9kSA952oKNOOvZYmSlCW5lwNkwtWJUIxhWOl9BQrfL1Nik1PIJyUYVQHxmkcWy X-Received: by 10.112.13.98 with SMTP id g2mr10367131lbc.18.1448358510959; Tue, 24 Nov 2015 01:48:30 -0800 (PST) MIME-Version: 1.0 Received: by 10.112.101.7 with HTTP; Tue, 24 Nov 2015 01:47:51 -0800 (PST) In-Reply-To: References: From: =?UTF-8?Q?Sergio_Fern=C3=A1ndez?= Date: Tue, 24 Nov 2015 10:47:51 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [VOTE] Accept S2Graph into Apache Incubation To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=001a11c3b8a68dbe1e0525463edc --001a11c3b8a68dbe1e0525463edc Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 (binding) On Tue, Nov 24, 2015 at 10:44 AM, Rob Vesse wrote: > +1 (binding) > > Good luck > > Rob > > On 24/11/2015 00:53, "Hyunsik Choi" wrote: > > >Hello folks, > > > >Thanks for all the feedback on the S2Graph Proposal. > > > >I would like to call for a [VOTE] on S2Graph joining the ASF as an > >incubation project. > > > >The vote is open for at least 72 hours: > > > >[ ] +1 accept S2Graph in the Incubator > >[ ] =C2=B10 > >[ ] -1 (please give reason) > > > >S2Graph provides a scalable distributed graph database engine over a > >key/value store such as HBase. S2Graph provides a fully asynchronous > >API to manipulate data as a property graph model and fast > >breadth-first-search queries over the graph. S2Graph is designed for > >OLTP-like workloads on graph data sets instead of batch processing, > >and it also provides INSERT/UPDATE operations on them. > > > >The proposal is available on the wiki here: > >https://wiki.apache.org/incubator/S2GraphProposal > > > >Best regards, > >Hyunsik > > > > > > > >------------------------------------------------------------------------= -- > >---------------------- > >=3D S2Graph Proposal =3D > > > >=3D=3D Abstract =3D=3D > >S2Graph is a distributed and scalable OLTP graph database built on > >Apache HBase to support fast traversal of extremely large graphs. > > > >=3D=3D Proposal =3D=3D > >S2Graph provides a scalable distributed graph database engine over a > >key/value store such as HBase. S2Graph provides a fully asynchronous > >API to manipulate data as a property graph model and fast > >breadth-first-search queries over the graph. S2Graph is designed for > >OLTP-like workloads on graph data sets instead of batch processing. > >Also, S2Graph provides INSERT/UPDATE operations. Its name 'S2Graph' is > >an abbreviated word of '''S'''uper '''S'''imple '''Graph''' Database. > > > >Here are additional materials to introduce S2Graph. > > * HBaseCon 2015 - http://www.slideshare.net/HBaseCon/use-cases-session-= 5 > > * Apache: Big Data 2015 - > >http://schd.ws/hosted_files/apachebigdata2015/06/s2graph_apache_con.pdf > > > >=3D=3D Background =3D=3D > >S2Graph initially started as an internal project at Kakao.com to > >efficiently store user relations and user activities as one large > >graph and to provide a unified query interface to traverse the graph. > >It was open sourced on Github about a 3 months ago in June 2015. > > > >Over time, S2Graph using HBase as the storage tier has begun by > >adapted into various applications, such as messaging, social feeds, > >and realtime recommendations at Kakao. > > > >Users can benefit by using S2Graph`s generalized high level graph > >abstraction API instead of querying via low-level key/value APIs, just > >as Apache Phoenix provides a SQL layer over HBase. > > > >=3D=3D Rationale =3D=3D > >Graph data (highly interconnected data) is very abundant and important > >these days. When users have a multitude of relationships, each with > >complex properties associated with them, a graph model is more > >intuitive and efficient than tabular formats (RDBMS). > > > >There are many ASF projects that provide SQL tiers, but there is no > >ASF projects that provide a scalable graph layer on top of the > >existing hadoop ecosystem. When graph data grows to the trillion edge > >scale, the process of traversing takes a long time and can be costly. > >However, with the benefit of HBase`s scalable architecture, S2Graph > >can traverse large graphs in a breadth-first-search manner > >efficiently. > > > >S2Graph also interoperates with several existing Apache projects > >(HBase, Apache Spark) to provide means of merging real time events and > >batch processed data using the property graph data model. > > > >Many developers run their own domain specific API servers to serve > >their data products, but a graph model is general and the S2Graph API > >fully supports traversal of the graph, so it can be used as a scalable > >general purpose API serving layer for various domains. As long as data > >can be modeled as graph, then users can avoid tedious work developing > >customized API servers if they use S2Graph. > > > >=3D=3D Initial Goals =3D=3D > >The initial goals will be to move the existing codebase to Apache and > >integrate with the Apache development process. Once this is > >accomplished, we plan for incremental development and releases that > >follow the Apache guidelines. > > > >=3D=3D Current Status =3D=3D > > > >=3D=3D=3D Meritocracy =3D=3D=3D > >S2Graph operated on meritocratic principles from the get go. > >Currently, all the discussions pertaining to S2Graph development are > >public on Github. The current incubation proposal includes the major > >code contributors to S2Graph. Several additional people have worked on > >the S2graph codebase for industry use cases and would be interested in > >becoming committers. We are starting with a small committer group and > >we plan to add additional committers following an open merit-based > >decision process during the incubation phase. > > > >=3D=3D=3D Community =3D=3D=3D > >We have already begun building a community but at this time the > >community consists only of S2Graph developers =E2=80=93 all Kakao employ= ees =E2=80=93 > >and prospective users. S2Graph seeks to develop developer and user > >communities during incubation. > > > >=3D=3D=3D Core Developers =3D=3D=3D > >S2Graph is currently being designed and developed by 2 engineers from > >Kakao. - Doyung Yoon, Deawon Jeong. > > > >=3D=3D=3D Alignment =3D=3D=3D > >Our proposed S2Graph effort aligns closely with Apache HBase. The > >HBase project perimeter is denoted by a simple byte-array based > >Create, Read, Update, Delete and Scan API with no current plans to > >extend beyond these bounds. > > > >S2Graph complements this with a higher level API for a property graph > >model. > > > >S2Graph was designed to offer a scalable distributed graph database > >skin over HBase from the beginning in order to provide a property > >graph model and breadth first search, and will continue to focus on > >providing the graph model. > > > >=3D=3D Known Risks =3D=3D > >=3D=3D=3D Orphaned Products =3D=3D=3D > >The core developers of S2Graph team plan to work full time on this > >project. There is very little risk of S2Graph getting orphaned since > >at least one large company (Kakao) is extensively using it in their > >production HBase clusters. For example, currently there are 20+ use > >cases with more than 1+Trillion edges and 140 million breadth first > >search query requests per minute using S2Graph in production. We plan > >to extend and diversify this community further through Apache. > > > >=3D=3D=3D Inexperience with Open Source =3D=3D=3D > >The core developers are all active users and followers of open source. > >They are already committers and contributors to the S2Graph Github > >project. All have been involved with the source code that has been > >released under an open source license. Though the core set of > >Developers do not have Apache Open Source experience, there are plans > >to onboard individuals with Apache open source experience to the > >project. > > > >=3D=3D=3D Homogenous Developers =3D=3D=3D > >Most committers in this proposal belong to the same institution > >(Kakao). The engagement of these committers goes well beyond the > >necessary development to support research, and all committers work on > >S2Graph full time. Several people from other institutions are working > >on and are familiar with the S2Graph codebase. We will work to attract > >them as future committers during the incubation phase, following a > >merit-based approach. > > > >=3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > >Kakao invested in S2Graph as the distributed graph database solution > >on top of HBase and some of its key engineers are working full time on > >the project. We look forward to other Apache developers and > >researchers contributing to the project. Also key to addressing the > >risk associated with relying on Salaried developers from a single > >entity is to increase the diversity of the contributors and actively > >lobby for Domain experts in the graph database space to contribute. > >Apache S2Graph intends to do this. > > > >=3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > >S2Graph has a strong relationship and dependency with Apache HBase and > >Apache Spark. Being part of Apache=E2=80=99s Incubation community, could= help > >with a closer collaboration among these two projects and as well as > >others. > > > >In terms of graph processing frameworks, S2Graph and Apache Giraph > >look similar. However, their goals are apparently different to each > >other. Giraph aims at analytical batch processing on immutable graph > >data sets. In contrast, S2Graph is designed for OLTP-like workloads on > >graph data sets, and S2Graph provides INSERT/UPDATE operations too. > > > > > >=3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > >S2Graph is proposing to enter incubation at Apache in order to help > >efforts to diversify the committer-base, not so much to capitalize on > >the Apache brand. The S2Graph project is in production use already > >inside Kakao, but is not expected to be a Kakao product for external > >customers. As such, the S2Graph project is not seeking to use the > >Apache brand as a marketing tool. > > > >=3D=3D Documentation =3D=3D > >Information about S2Graph can be found at > >https://github.com/kakao/s2graph. The following links provide more > >information about S2Graph in open source: > > * S2Graph web site: https://steamshon.gitbooks.io/s2graph-book/content/ > > * Codebase at Github: https://github.com/kakao/s2graph > > * Issue Tracking: https://github.com/kakao/s2graph/issues > > * User community: https://groups.google.com/forum/#!forum/s2graph > > > >=3D=3D Initial Source =3D=3D > > > >The S2Graph codebase is currently hosted on Github: > >https://github.com/kakao/s2graph. > > > >=3D=3D=3D Source and Intellectual Property Submission Plan =3D=3D=3D > > > >Currently, the S2Graph codebase is distributed under the Apache 2.0 > >License. > > > >=3D=3D External Dependencies =3D=3D > > > >Beyond relying on Apache HBase, S2Graph has the following external > >dependencies: > > * Asynchbase (BSD) > > * Play Framework (Apache 2.0 license) > > * Scala (http://www.scala-lang.org/license.html) > > * Spark (Apache 2.0 license) > > * Kafka (Apache 2.0 license) > > > >=3D=3D Required Resources =3D=3D > > > >=3D=3D=3D Mailing list =3D=3D=3D > > > >We will migrate our mailing lists to the following: > > * users@s2graph.incubator.apache.org > > * dev@s2graph.incubator.apache.org > > * private@s2graph.incubator.apache.org > > * commits@s2graph.incubator.apache.org > > > >=3D=3D=3D Source control =3D=3D=3D > > > >The S2Graph team would like to use Git for source code control, due to > >our current use of Git. We request a writeable Git repo for S2Graph, > >and mirroring to be set up to Github through INFRA. > > > >=3D=3D=3D Issue Tracking =3D=3D=3D > > > >S2Graph currently uses the github issue tracking system associated > >with its github repo (https://github.com/kakao/s2graph/issues). We > >will migrate to the Apache JIRA > >(http://issues.apache.org/jira/browse/S2Graph). > > > >=3D=3D=3D Other Resources =3D=3D=3D > > > > * Jenkins/Hudson for builds and test running. > > * Wiki for documentation purposes. > > * Blog to improve project dissemination. > > > >=3D=3D Initial Committers =3D=3D > > > > * Doyung Yoon > > * Daewon Jeong > > * Jaesang Kim > > * Hwansung Yu > > * Min-Seok Kim > > * Chul Kang > > * Luke Han > > * Alexander Bezzubov > > > >=3D=3D Affiliations =3D=3D > > > > * Doyung Yoon, Kakao > > * Daewon Jeong, Kakao > > * Jaesang Kim, Kakao > > * Hwansung Yu, Kakao > > * Min-Seok Kim, Kakao > > * Chul Kang, Kakao, > > * Luke Han, Ebay Inc. > > * Alexander Bezzubov, NFLabs > > > >=3D=3D Sponsors =3D=3D > > > >=3D=3D=3D Champion =3D=3D=3D > >Hyunsik Choi > > > >=3D=3D=3D Nominated Mentors =3D=3D=3D > > * Andrew Purtell - Apache Member, Salesforce > > * Sergio Fern=C3=A1ndez - Apache Member, Redlink > > * Hyunsik Choi - Apache Member, Gruter Inc. > > * Seetharam Venkatesh - IPMC, Hortonworks Inc. > > > >=3D=3D=3D Sponsoring Entity =3D=3D=3D > > > > * The Apache Incubator > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > >For additional commands, e-mail: general-help@incubator.apache.org > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --=20 Sergio Fern=C3=A1ndez Partner Technology Manager Redlink GmbH m: +43 6602747925 e: sergio.fernandez@redlink.co w: http://redlink.co --001a11c3b8a68dbe1e0525463edc--