Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 176AD18955 for ; Mon, 9 Nov 2015 19:50:15 +0000 (UTC) Received: (qmail 79873 invoked by uid 500); 9 Nov 2015 19:50:14 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 79649 invoked by uid 500); 9 Nov 2015 19:50:14 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 79631 invoked by uid 99); 9 Nov 2015 19:50:14 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Nov 2015 19:50:14 +0000 Received: from mail-lf0-f49.google.com (mail-lf0-f49.google.com [209.85.215.49]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 50C1B1A026E for ; Mon, 9 Nov 2015 19:50:13 +0000 (UTC) Received: by lffu14 with SMTP id u14so16214623lff.1 for ; Mon, 09 Nov 2015 11:50:11 -0800 (PST) X-Received: by 10.25.137.4 with SMTP id l4mr8317383lfd.121.1447098611887; Mon, 09 Nov 2015 11:50:11 -0800 (PST) MIME-Version: 1.0 Received: by 10.25.214.31 with HTTP; Mon, 9 Nov 2015 11:49:32 -0800 (PST) In-Reply-To: References: From: Andrew Purtell Date: Mon, 9 Nov 2015 11:49:32 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [DISCUSS] S2Graph Incubator Proposal To: "general@incubator.apache.org" Content-Type: multipart/alternative; boundary=001a113fb604b78a96052420e631 --001a113fb604b78a96052420e631 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable If you are looking for mentors let me volunteer as one. I think S2Graph has the potential to be a good addition to the Apache family given its relationships and dependencies with other Apache projects from the outset. On Mon, Nov 9, 2015 at 10:54 AM, Hyunsik Choi wrote: > This project is looking for mentors. Anyone can help? We are also > looking forward to any feedback. > > Also, I attached the proposal here. I forgot it. > > ---------------- > > =3D S2Graph Proposal =3D > > =3D=3D Abstract =3D=3D > S2Graph is a distributed and scalable OLTP graph database built on > HBase to support fast traversal on extremely large graph. > > Here are additional materials to introduce S2Graph. > * HBaseCon 2015 - http://www.slideshare.net/HBaseCon/use-cases-session-5 > * Apache: Big Data 2015 - > http://schd.ws/hosted_files/apachebigdata2015/06/s2graph_apache_con.pdf > > =3D=3D Proposal =3D=3D > S2Graph is to provide a scalable distributed graph database engine > over key/value storage such as HBase. S2Graph provide fully > ashynchronous API to manupulate data as property graph model and fast > breadth first search query on graph. > > =3D=3D Background =3D=3D > S2Graph initially started as an internal project at Kakao.com to > efficiently store user relation and user activities as one large graph > and provide unified query to traverse graph. It was open sourced on > Github about a 3 months ago in June 2015. > > Over time S2Graph, together with HBase as storage tier, has begun to > be adapted into various applications, such as messaging, social feeds, > realtime recommendations at Kakao. > > Users can benefit from S2Graph`s generalized high level API instead of > low-level key/value API for graph abstraction, just like Phoenix > provide SQL layer over HBase. > > =3D=3D Rationale =3D=3D > Graph data(highly interconnected data) is very abundant and important > these days. > When users have a multitude of relationships, each with complex > properties associated with them, graph model is more intuitive and > efficient than tabular format(RDBMS). > There are many ASF projects that provide SQL layer, but there is no > ASF projects that provide scalable graph layer on existing hadoop echo > system. > When graph data grows to trillion edge scale, the process of > traversing takes a long time and costly. However, with the benefit of > HBase`s scalable architecture, S2Graph can traverse large graph in > breadth first search manner efficiently. > > S2Graph also interoperates with several existing Apache > projects(HBase, Spark) to provide way to merge real time events and > batch processed data using property graph data model. > > Many developers are running their own domain specific API servers to > serve their data products, but graph model is general and S2Graph API > fully support traverse on graph, so it can be used as scalable general > purpose API serving layer for various domains. > As long as data can be modeled as graph, then users can avoid tedious > work for developing customized API servers by using S2Graph. > > =3D=3D Initial Goals =3D=3D > The initial goals will be to move the existing codebase to Apache and > integrate with the Apache development process. Once this is > accomplished, we plan for incremental development and releases that > follow the Apache guidelines. > > =3D=3D Current Status =3D=3D > > =3D=3D=3D Meritocracy =3D=3D=3D > S2Graph operated on meritocratic principles from the get go. > Currently, all the discussions pertaining to S2Graph development are > public on Github. The current incubation > proposal includes the major code contributors to S2Graph. Several > additional people have worked on the S2graph codebase for industry use > cases and would be interested in becoming committers. We are starting > with a small committer group and we plan to add additional committers > following an open merit-based decision process during the incubation > phase. > > =3D=3D=3D Community =3D=3D=3D > We have already begun building a community but at this time the > community consists only of S2Graph developers =E2=80=93 all Kakao employe= es =E2=80=93 > and prospective users. > S2Graph seeks to develop developer and user communities during incubation= . > > =3D=3D=3D Core Developers =3D=3D=3D > S2Graph is currently being designed and developed by 2 engineers from > Kakao. - Doyung Yoon, Deawon Jeong. > > =3D=3D=3D Alignment =3D=3D=3D > Our proposed S2Graph effort aligns closely with Apache HBase. The > HBase project perimeter is denoted by a simple byte-array based > Create, Read, Update, Delete and Scan APIs with no current plans to > extend beyond this bounds. > > S2Graph complements this with a higher level API for property graph model= . > > S2Graph was designed to offer scalable distributed graph database skin > over HBase from the beginning in order to provide property graph model > and breadth first search, and continue to focus on providing graph > model. > > =3D=3D Known Risks =3D=3D > =3D=3D=3D Orphaned Products =3D=3D=3D > The core developers of S2Graph team plan to work full time on this > project. There is very little risk of S2Graph getting orphaned since > at least one large company (Kakao) is extensively using it in their > production HBase clusters. For example, currently there are 20+ use > cases with more than 1+Trillion edges and 140 million breadth first > search query requests per minute using S2Graph in production. > We plan to extend and diversify this community further through Apache. > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > The core developers are all active users and followers of open source. > They are already committers and contributors to the S2Graph Github > project. All have been involved with the source code that has been > released under an open source license. Though the core set of > Developers do not have Apache Open Source experience, there are plans > to onboard individuals with Apache open source experience on to the > project. > > =3D=3D=3D Homogenous Developers =3D=3D=3D > Most committers in this proposal belong to the same institution > (Kakao). The engagement of these committers goes well beyond the > necessary development to support research, and all committers work on > S2Graph full time. > Several people from other institutions are working on and are familiar > with the S2Graph codebase. We will work to attract them as future > committers during the incubation phase, following a merit-based > approach. > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > Kakao invested in S2Graph as the distributed graph database solution > on top of HBase and some of its key engineers are working full time on > the project. > We look forward to other Apache developers and researchers to > contribute to the project. > Also key to addressing the risk associated with relying on Salaried > developers from a single entity is to increase the diversity of the > contributors and actively lobby for Domain experts in the graph > database space to contribute. Apache S2Graph intends to do this. > > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > S2Graph has a strong relationship and dependency with Apache Hadoop > HBase and Spark. > Being part of Apache=E2=80=99s Incubation community, could help with a cl= oser > collaboration among these two projects and as well as others. > > In terms of graph processing frameworks, S2Graph and Apache Giraph > look similar. However, their goals are apparently different to each > other. Giraph aims at analytical batch processing on immutable graph > data sets. In contrast, S2Graph is designed for OLTP-like workloads on > graph data sets, and S2Graph provides INSERT/UPDATE operations too. > > > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > S2Graph is proposing to enter incubation at Apache in order to help > efforts to diversify the committer-base, not so much to capitalize on > the Apache brand. The S2Graph project is in production use already > inside Kakao, but is not expected to be an Kakao product for external > customers. As such, the S2Graph project is not seeking to use the > Apache brand as a marketing tool. > > =3D=3D Documentation =3D=3D > Information about S2Graph can be found at > https://github.com/kakao/s2graph. The following links provide more > information about S2Graph in open source: > * S2Graph web site: https://steamshon.gitbooks.io/s2graph-book/content/ > * Codebase at Github: https://github.com/kakao/s2graph > * Issue Tracking: https://github.com/kakao/s2graph/issues > * User community: https://groups.google.com/forum/#!forum/s2graph > > =3D=3D Initial Source =3D=3D > > The S2Graph codebase is currently hosted on Github: > https://github.com/kakao/s2graph > > =3D=3D=3D Source and Intellectual Property Submission Plan =3D=3D=3D > > Currently, the S2Graph codebase is distributed under the Apache 2.0 > License. > > =3D=3D External Dependencies =3D=3D > > Beyond relying on Apache HBase, Phoenix has the following external > dependencies: > * Asynchbase (BSD license: http://www.antlr3.org/license.html) > * Mysql (BSD license: > https://github.com/julianhyde/sqlline/blob/master/LICENSE) > * Play Framework (Apache 2.0 license: > https://github.com/playframework/playframework) > * Scala (https://github.com/scala/scala) > * Spark > * Kafka > > =3D=3D Required Resources =3D=3D > > =3D=3D=3D Mailing list =3D=3D=3D > > We will migrate our mailing lists to the following: > * users@s2graph.incubator.apache.org > * dev@s2graph.incubator.apache.org > * private@s2graph.incubator.apache.org > * commits@s2graph.incubator.apache.org > > =3D=3D=3D Source control =3D=3D=3D > > The S2Graph team would like to use Git for source control, due to our > current use of Git. We request a writeable Git repo for S2Graph, and > mirroring to be set up to Github through INFRA. > > =3D=3D=3D Issue Tracking =3D=3D=3D > > S2Graph currently uses the github issue tracking system associated > with its github repo: https://github.com/kakao/s2graph/issues. We will > migrate to the Apache JIRA: > http://issues.apache.org/jira/browse/S2Graph > > =3D=3D=3D Other Resources =3D=3D=3D > > Jenkins/Hudson for builds and test running. > Wiki for documentation purposes > Blog to improve project dissemination > > =3D=3D Initial Committers =3D=3D > > * Doyung Yoon > * Daewon Jeong > * Jaesang Kim > * Hwansung Yu > * Min-Seok Kim > * Chul Kang > > =3D=3D Affiliations =3D=3D > > The initial committers are from one organizations: Kakao. > * Doyung Yoon, Kakao > * Daewon Jeong, Kakao > * Jaesang Kim, Kakao > * Hwansung Yu, Kakao > * Min-Seok Kim, Kakao > * Chul Kang, Kakao > > =3D=3D Sponsors =3D=3D > > =3D=3D=3D Champion =3D=3D=3D > Hyunsik Choi > > =3D=3D=3D Nominated Mentors =3D=3D=3D > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > > * The Apache Incubator > > On Fri, Nov 6, 2015 at 4:05 PM, Hyunsik Choi wrote: > > Hi Seetharam, > > > > Thank you for a good question. That seem to be a frequent question to > > this project. > > > > Here is the answer to your question. > > > https://steamshon.gitbooks.io/s2graph-book/content/what_is_different_to_t= itan.html > > > > I hope that this link is helpful to your understanding. > > > > Best regards, > > Hyunsik > > > > > > > > On Fri, Nov 6, 2015 at 3:07 PM, Seetharam Venkatesh > > wrote: > >> Hi Hyunsik, > >> > >> The proposal looks interesting and want to know how is this different > from > >> existing solutions in the same space such as Titan, etc. > >> > >> Thanks! > >> Venkatesh > >> > >> > >> On Fri, Nov 6, 2015 at 1:36 PM Hyunsik Choi wrote= : > >> > >>> Hi folks, > >>> > >>> We would like to start a discussion on S2Graph as an incubation > project. > >>> > >>> S2Graph is a distributed and scalable OLTP graph database built on > >>> HBase. It provides interactive queries for vertex/edge/sub-graphs on > >>> extremely large graph data sets as well as insertion and update > >>> operations. > >>> > >>> S2Graph was already introduced in Apache BigData and HBaseCon this > year. > >>> > >>> The proposal is available at : > >>> https://wiki.apache.org/incubator/S2GraphProposal > >>> > >>> We are looking forward to any feedback. In addition, we are looking > >>> for volunteers as mentors. > >>> > >>> Best regards, > >>> Hyunsik > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > >>> For additional commands, e-mail: general-help@incubator.apache.org > >>> > >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --=20 Best regards, - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) --001a113fb604b78a96052420e631--