incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyunsik Choi <hyun...@apache.org>
Subject Re: [DISCUSS] S2Graph Incubator Proposal
Date Mon, 09 Nov 2015 23:45:42 GMT
Thank you all guys  I just put you names on the nominated mentor list.

@Andrew,

I agree with you. S2Graph already has good relationships with other
ASF projects, such as HBase and Spark,  In addition, they have a plan
to expand its relationship to Apache incubator TinkerPop, which is a
graph computing framework. I'm looking forward to their combinations.

@Sergio,

Thank you for attending the talk and joining the S2Graph mentors. That
was Doyung Yoon, one of the S2Graph creators. He had a talk at the
last ApacheCon.

On Mon, Nov 9, 2015 at 11:58 AM, Sergio Fernández <wikier@apache.org> wrote:
> Hi Hyunsik, I attended your talk at the last ApacheCon, and I think S2 has
> quite some potential. So if you need a mentor, count me in!
>
> On Mon, Nov 9, 2015 at 7:54 PM, Hyunsik Choi <hyunsik@apache.org> wrote:
>
>> This project is looking for mentors. Anyone can help? We are also
>> looking forward to any feedback.
>>
>> Also, I attached the proposal here. I forgot it.
>>
>> ----------------
>>
>> = S2Graph Proposal =
>>
>> == Abstract ==
>> S2Graph is a distributed and scalable OLTP graph database built on
>> HBase to support fast traversal on extremely large graph.
>>
>> Here are additional materials to introduce S2Graph.
>>  * HBaseCon 2015 - http://www.slideshare.net/HBaseCon/use-cases-session-5
>>  * Apache: Big Data 2015 -
>> http://schd.ws/hosted_files/apachebigdata2015/06/s2graph_apache_con.pdf
>>
>> == Proposal ==
>> S2Graph is to provide a scalable distributed graph database engine
>> over key/value storage such as HBase. S2Graph provide fully
>> ashynchronous API to manupulate data as property graph model and fast
>> breadth first search query on graph.
>>
>> == Background ==
>> S2Graph initially started as an internal project at Kakao.com to
>> efficiently store user relation and user activities as one large graph
>> and provide unified query to traverse graph. It was open sourced on
>> Github about a 3 months ago in June 2015.
>>
>> Over time S2Graph, together with HBase as storage tier, has begun to
>> be adapted into various applications, such as messaging, social feeds,
>> realtime recommendations at Kakao.
>>
>> Users can benefit from S2Graph`s generalized high level API instead of
>> low-level key/value API for graph abstraction, just like Phoenix
>> provide SQL layer over HBase.
>>
>> == Rationale ==
>> Graph data(highly interconnected data) is very abundant and important
>> these days.
>> When users have a multitude of relationships, each with complex
>> properties associated with them, graph model is more intuitive and
>> efficient than tabular format(RDBMS).
>> There are many ASF projects that provide SQL layer, but there is no
>> ASF projects that provide scalable graph layer on existing hadoop echo
>> system.
>> When graph data grows to trillion edge scale, the process of
>> traversing takes a long time and costly. However, with the benefit of
>> HBase`s scalable architecture, S2Graph can traverse large graph in
>> breadth first search manner efficiently.
>>
>> S2Graph also interoperates with several existing Apache
>> projects(HBase, Spark) to provide way to merge real time events and
>> batch processed data using property graph data model.
>>
>> Many developers are running their own domain specific API servers to
>> serve their data products, but graph model is general and S2Graph API
>> fully support traverse on graph, so it can be used as scalable general
>> purpose API serving layer for various domains.
>> As long as data can be modeled as graph, then users can avoid tedious
>> work for developing customized API servers by using S2Graph.
>>
>> == Initial Goals ==
>> The initial goals will be to move the existing codebase to Apache and
>> integrate with the Apache development process. Once this is
>> accomplished, we plan for incremental development and releases that
>> follow the Apache guidelines.
>>
>> == Current Status ==
>>
>> === Meritocracy ===
>> S2Graph operated on meritocratic principles from the get go.
>> Currently, all the discussions pertaining to S2Graph development are
>> public on Github. The current incubation
>> proposal includes the major code contributors to S2Graph. Several
>> additional people have worked on the S2graph codebase for industry use
>> cases and would be interested in becoming committers. We are starting
>> with a small committer group and we plan to add additional committers
>> following an open merit-based decision process during the incubation
>> phase.
>>
>> === Community ===
>> We have already begun building a community but at this time the
>> community consists only of S2Graph developers – all Kakao employees –
>> and prospective users.
>> S2Graph seeks to develop developer and user communities during incubation.
>>
>> === Core Developers ===
>> S2Graph is currently being designed and developed by 2 engineers from
>> Kakao. - Doyung Yoon, Deawon Jeong.
>>
>> === Alignment ===
>> Our proposed S2Graph effort aligns closely with Apache HBase. The
>> HBase project perimeter is denoted by a simple byte-array based
>> Create, Read, Update, Delete and Scan APIs with no current plans to
>> extend beyond this bounds.
>>
>> S2Graph complements this with a higher level API for property graph model.
>>
>> S2Graph was designed to offer scalable distributed graph database skin
>> over HBase from the beginning in order to provide property graph model
>> and breadth first search, and continue to focus on providing graph
>> model.
>>
>> == Known Risks ==
>> === Orphaned Products ===
>> The core developers of S2Graph team plan to work full time on this
>> project. There is very little risk of S2Graph getting orphaned since
>> at least one large company (Kakao) is extensively using it in their
>> production HBase clusters. For example, currently there are 20+ use
>> cases with more than 1+Trillion edges and 140 million breadth first
>> search query requests per minute using S2Graph in production.
>> We plan to extend and diversify this community further through Apache.
>>
>> === Inexperience with Open Source ===
>> The core developers are all active users and followers of open source.
>> They are already committers and contributors to the S2Graph Github
>> project. All have been involved with the source code that has been
>> released under an open source license. Though the core set of
>> Developers do not have Apache Open Source experience, there are plans
>> to onboard individuals with Apache open source experience on to the
>> project.
>>
>> === Homogenous Developers ===
>> Most committers in this proposal belong to the same institution
>> (Kakao). The engagement of these committers goes well beyond the
>> necessary development to support research, and all committers work on
>> S2Graph full time.
>> Several people from other institutions are working on and are familiar
>> with the S2Graph codebase. We will work to attract them as future
>> committers during the incubation phase, following a merit-based
>> approach.
>>
>> === Reliance on Salaried Developers ===
>> Kakao invested in S2Graph as the distributed graph database solution
>> on top of HBase and some of its key engineers are working full time on
>> the project.
>> We look forward to other Apache developers and researchers to
>> contribute to the project.
>> Also key to addressing the risk associated with relying on Salaried
>> developers from a single entity is to increase the diversity of the
>> contributors and actively lobby for Domain experts in the graph
>> database space to contribute. Apache S2Graph intends to do this.
>>
>> === Relationships with Other Apache Products ===
>> S2Graph has a strong relationship and dependency with Apache Hadoop
>> HBase and Spark.
>> Being part of Apache’s Incubation community, could help with a closer
>> collaboration among these two projects and as well as others.
>>
>> In terms of graph processing frameworks, S2Graph and Apache Giraph
>> look similar. However, their goals are apparently different to each
>> other. Giraph aims at analytical batch processing on immutable graph
>> data sets. In contrast, S2Graph is designed for OLTP-like workloads on
>> graph data sets, and S2Graph provides INSERT/UPDATE operations too.
>>
>>
>> === An Excessive Fascination with the Apache Brand ===
>> S2Graph is proposing to enter incubation at Apache in order to help
>> efforts to diversify the committer-base, not so much to capitalize on
>> the Apache brand. The S2Graph project is in production use already
>> inside Kakao, but is not expected to be an Kakao product for external
>> customers. As such, the S2Graph project is not seeking to use the
>> Apache brand as a marketing tool.
>>
>> == Documentation ==
>> Information about S2Graph can be found at
>> https://github.com/kakao/s2graph. The following links provide more
>> information about S2Graph in open source:
>>  * S2Graph web site: https://steamshon.gitbooks.io/s2graph-book/content/
>>  * Codebase at Github: https://github.com/kakao/s2graph
>>  * Issue Tracking: https://github.com/kakao/s2graph/issues
>>  * User community: https://groups.google.com/forum/#!forum/s2graph
>>
>> == Initial Source ==
>>
>> The S2Graph codebase is currently hosted on Github:
>> https://github.com/kakao/s2graph
>>
>> === Source and Intellectual Property Submission Plan ===
>>
>> Currently, the S2Graph codebase is distributed under the Apache 2.0
>> License.
>>
>> == External Dependencies ==
>>
>> Beyond relying on Apache HBase, Phoenix has the following external
>> dependencies:
>>  * Asynchbase (BSD license: http://www.antlr3.org/license.html)
>>  * Mysql (BSD license:
>> https://github.com/julianhyde/sqlline/blob/master/LICENSE)
>>  * Play Framework (Apache 2.0 license:
>> https://github.com/playframework/playframework)
>>  * Scala (https://github.com/scala/scala)
>>  * Spark
>>  * Kafka
>>
>> == Required Resources ==
>>
>> === Mailing list ===
>>
>> We will migrate our mailing lists to the following:
>>  * users@s2graph.incubator.apache.org
>>  * dev@s2graph.incubator.apache.org
>>  * private@s2graph.incubator.apache.org
>>  * commits@s2graph.incubator.apache.org
>>
>> === Source control ===
>>
>> The S2Graph team would like to use Git for source control, due to our
>> current use of Git. We request a writeable Git repo for S2Graph, and
>> mirroring to be set up to Github through INFRA.
>>
>> === Issue Tracking ===
>>
>> S2Graph currently uses the github issue tracking system associated
>> with its github repo: https://github.com/kakao/s2graph/issues. We will
>> migrate to the Apache JIRA:
>> http://issues.apache.org/jira/browse/S2Graph
>>
>> === Other Resources ===
>>
>> Jenkins/Hudson for builds and test running.
>> Wiki for documentation purposes
>> Blog to improve project dissemination
>>
>> == Initial Committers ==
>>
>>  * Doyung Yoon <shom83 at gmail.com>
>>  * Daewon Jeong <blueiur at gmail.com>
>>  * Jaesang Kim <honeysleep at gmail.com>
>>  * Hwansung Yu <deejayfwan at gmail.com>
>>  * Min-Seok Kim <mskim.org at gmail.com>
>>  * Chul Kang <miralchul at gmail.com>
>>
>> == Affiliations ==
>>
>> The initial committers are from one organizations: Kakao.
>>  * Doyung Yoon, Kakao
>>  * Daewon Jeong, Kakao
>>  * Jaesang Kim, Kakao
>>  * Hwansung Yu, Kakao
>>  * Min-Seok Kim, Kakao
>>  * Chul Kang, Kakao
>>
>> == Sponsors ==
>>
>> === Champion ===
>> Hyunsik Choi
>>
>> === Nominated Mentors ===
>>
>> === Sponsoring Entity ===
>>
>>  * The Apache Incubator
>>
>> On Fri, Nov 6, 2015 at 4:05 PM, Hyunsik Choi <hyunsik@apache.org> wrote:
>> > Hi Seetharam,
>> >
>> > Thank you for a good question. That seem to be a frequent question to
>> > this project.
>> >
>> > Here is the answer to your question.
>> >
>> https://steamshon.gitbooks.io/s2graph-book/content/what_is_different_to_titan.html
>> >
>> > I hope that this link is helpful to your understanding.
>> >
>> > Best regards,
>> > Hyunsik
>> >
>> >
>> >
>> > On Fri, Nov 6, 2015 at 3:07 PM, Seetharam Venkatesh
>> > <venkatesh@innerzeal.com> wrote:
>> >> Hi Hyunsik,
>> >>
>> >> The proposal looks interesting and want to know how is this different
>> from
>> >> existing solutions in the same space such as Titan, etc.
>> >>
>> >> Thanks!
>> >> Venkatesh
>> >>
>> >>
>> >> On Fri, Nov 6, 2015 at 1:36 PM Hyunsik Choi <hyunsik@apache.org> wrote:
>> >>
>> >>> Hi folks,
>> >>>
>> >>> We would like to start a discussion on S2Graph as an incubation
>> project.
>> >>>
>> >>> S2Graph is a distributed and scalable OLTP graph database built on
>> >>> HBase. It provides interactive queries for vertex/edge/sub-graphs on
>> >>> extremely large graph data sets as well as insertion and update
>> >>> operations.
>> >>>
>> >>> S2Graph was already introduced in Apache BigData and HBaseCon this
>> year.
>> >>>
>> >>> The proposal is available at :
>> >>> https://wiki.apache.org/incubator/S2GraphProposal
>> >>>
>> >>> We are looking forward to any feedback. In addition, we are looking
>> >>> for volunteers as mentors.
>> >>>
>> >>> Best regards,
>> >>> Hyunsik
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> >>> For additional commands, e-mail: general-help@incubator.apache.org
>> >>>
>> >>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>
>
> --
> Sergio Fernández
> Partner Technology Manager
> Redlink GmbH
> m: +43 6602747925
> e: sergio.fernandez@redlink.co
> w: http://redlink.co

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message