incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seetharam Venkatesh <venkat...@innerzeal.com>
Subject Re: [DISCUSS] S2Graph Incubator Proposal
Date Tue, 10 Nov 2015 00:21:16 GMT
Hi Hyunsik,

If you are still looking for mentors, let me volunteer as one.

Thanks!

On Mon, Nov 9, 2015 at 3:45 PM Hyunsik Choi <hyunsik@apache.org> wrote:

> Thank you all guys  I just put you names on the nominated mentor list.
>
> @Andrew,
>
> I agree with you. S2Graph already has good relationships with other
> ASF projects, such as HBase and Spark,  In addition, they have a plan
> to expand its relationship to Apache incubator TinkerPop, which is a
> graph computing framework. I'm looking forward to their combinations.
>
> @Sergio,
>
> Thank you for attending the talk and joining the S2Graph mentors. That
> was Doyung Yoon, one of the S2Graph creators. He had a talk at the
> last ApacheCon.
>
> On Mon, Nov 9, 2015 at 11:58 AM, Sergio Fernández <wikier@apache.org>
> wrote:
> > Hi Hyunsik, I attended your talk at the last ApacheCon, and I think S2
> has
> > quite some potential. So if you need a mentor, count me in!
> >
> > On Mon, Nov 9, 2015 at 7:54 PM, Hyunsik Choi <hyunsik@apache.org> wrote:
> >
> >> This project is looking for mentors. Anyone can help? We are also
> >> looking forward to any feedback.
> >>
> >> Also, I attached the proposal here. I forgot it.
> >>
> >> ----------------
> >>
> >> = S2Graph Proposal =
> >>
> >> == Abstract ==
> >> S2Graph is a distributed and scalable OLTP graph database built on
> >> HBase to support fast traversal on extremely large graph.
> >>
> >> Here are additional materials to introduce S2Graph.
> >>  * HBaseCon 2015 -
> http://www.slideshare.net/HBaseCon/use-cases-session-5
> >>  * Apache: Big Data 2015 -
> >> http://schd.ws/hosted_files/apachebigdata2015/06/s2graph_apache_con.pdf
> >>
> >> == Proposal ==
> >> S2Graph is to provide a scalable distributed graph database engine
> >> over key/value storage such as HBase. S2Graph provide fully
> >> ashynchronous API to manupulate data as property graph model and fast
> >> breadth first search query on graph.
> >>
> >> == Background ==
> >> S2Graph initially started as an internal project at Kakao.com to
> >> efficiently store user relation and user activities as one large graph
> >> and provide unified query to traverse graph. It was open sourced on
> >> Github about a 3 months ago in June 2015.
> >>
> >> Over time S2Graph, together with HBase as storage tier, has begun to
> >> be adapted into various applications, such as messaging, social feeds,
> >> realtime recommendations at Kakao.
> >>
> >> Users can benefit from S2Graph`s generalized high level API instead of
> >> low-level key/value API for graph abstraction, just like Phoenix
> >> provide SQL layer over HBase.
> >>
> >> == Rationale ==
> >> Graph data(highly interconnected data) is very abundant and important
> >> these days.
> >> When users have a multitude of relationships, each with complex
> >> properties associated with them, graph model is more intuitive and
> >> efficient than tabular format(RDBMS).
> >> There are many ASF projects that provide SQL layer, but there is no
> >> ASF projects that provide scalable graph layer on existing hadoop echo
> >> system.
> >> When graph data grows to trillion edge scale, the process of
> >> traversing takes a long time and costly. However, with the benefit of
> >> HBase`s scalable architecture, S2Graph can traverse large graph in
> >> breadth first search manner efficiently.
> >>
> >> S2Graph also interoperates with several existing Apache
> >> projects(HBase, Spark) to provide way to merge real time events and
> >> batch processed data using property graph data model.
> >>
> >> Many developers are running their own domain specific API servers to
> >> serve their data products, but graph model is general and S2Graph API
> >> fully support traverse on graph, so it can be used as scalable general
> >> purpose API serving layer for various domains.
> >> As long as data can be modeled as graph, then users can avoid tedious
> >> work for developing customized API servers by using S2Graph.
> >>
> >> == Initial Goals ==
> >> The initial goals will be to move the existing codebase to Apache and
> >> integrate with the Apache development process. Once this is
> >> accomplished, we plan for incremental development and releases that
> >> follow the Apache guidelines.
> >>
> >> == Current Status ==
> >>
> >> === Meritocracy ===
> >> S2Graph operated on meritocratic principles from the get go.
> >> Currently, all the discussions pertaining to S2Graph development are
> >> public on Github. The current incubation
> >> proposal includes the major code contributors to S2Graph. Several
> >> additional people have worked on the S2graph codebase for industry use
> >> cases and would be interested in becoming committers. We are starting
> >> with a small committer group and we plan to add additional committers
> >> following an open merit-based decision process during the incubation
> >> phase.
> >>
> >> === Community ===
> >> We have already begun building a community but at this time the
> >> community consists only of S2Graph developers – all Kakao employees –
> >> and prospective users.
> >> S2Graph seeks to develop developer and user communities during
> incubation.
> >>
> >> === Core Developers ===
> >> S2Graph is currently being designed and developed by 2 engineers from
> >> Kakao. - Doyung Yoon, Deawon Jeong.
> >>
> >> === Alignment ===
> >> Our proposed S2Graph effort aligns closely with Apache HBase. The
> >> HBase project perimeter is denoted by a simple byte-array based
> >> Create, Read, Update, Delete and Scan APIs with no current plans to
> >> extend beyond this bounds.
> >>
> >> S2Graph complements this with a higher level API for property graph
> model.
> >>
> >> S2Graph was designed to offer scalable distributed graph database skin
> >> over HBase from the beginning in order to provide property graph model
> >> and breadth first search, and continue to focus on providing graph
> >> model.
> >>
> >> == Known Risks ==
> >> === Orphaned Products ===
> >> The core developers of S2Graph team plan to work full time on this
> >> project. There is very little risk of S2Graph getting orphaned since
> >> at least one large company (Kakao) is extensively using it in their
> >> production HBase clusters. For example, currently there are 20+ use
> >> cases with more than 1+Trillion edges and 140 million breadth first
> >> search query requests per minute using S2Graph in production.
> >> We plan to extend and diversify this community further through Apache.
> >>
> >> === Inexperience with Open Source ===
> >> The core developers are all active users and followers of open source.
> >> They are already committers and contributors to the S2Graph Github
> >> project. All have been involved with the source code that has been
> >> released under an open source license. Though the core set of
> >> Developers do not have Apache Open Source experience, there are plans
> >> to onboard individuals with Apache open source experience on to the
> >> project.
> >>
> >> === Homogenous Developers ===
> >> Most committers in this proposal belong to the same institution
> >> (Kakao). The engagement of these committers goes well beyond the
> >> necessary development to support research, and all committers work on
> >> S2Graph full time.
> >> Several people from other institutions are working on and are familiar
> >> with the S2Graph codebase. We will work to attract them as future
> >> committers during the incubation phase, following a merit-based
> >> approach.
> >>
> >> === Reliance on Salaried Developers ===
> >> Kakao invested in S2Graph as the distributed graph database solution
> >> on top of HBase and some of its key engineers are working full time on
> >> the project.
> >> We look forward to other Apache developers and researchers to
> >> contribute to the project.
> >> Also key to addressing the risk associated with relying on Salaried
> >> developers from a single entity is to increase the diversity of the
> >> contributors and actively lobby for Domain experts in the graph
> >> database space to contribute. Apache S2Graph intends to do this.
> >>
> >> === Relationships with Other Apache Products ===
> >> S2Graph has a strong relationship and dependency with Apache Hadoop
> >> HBase and Spark.
> >> Being part of Apache’s Incubation community, could help with a closer
> >> collaboration among these two projects and as well as others.
> >>
> >> In terms of graph processing frameworks, S2Graph and Apache Giraph
> >> look similar. However, their goals are apparently different to each
> >> other. Giraph aims at analytical batch processing on immutable graph
> >> data sets. In contrast, S2Graph is designed for OLTP-like workloads on
> >> graph data sets, and S2Graph provides INSERT/UPDATE operations too.
> >>
> >>
> >> === An Excessive Fascination with the Apache Brand ===
> >> S2Graph is proposing to enter incubation at Apache in order to help
> >> efforts to diversify the committer-base, not so much to capitalize on
> >> the Apache brand. The S2Graph project is in production use already
> >> inside Kakao, but is not expected to be an Kakao product for external
> >> customers. As such, the S2Graph project is not seeking to use the
> >> Apache brand as a marketing tool.
> >>
> >> == Documentation ==
> >> Information about S2Graph can be found at
> >> https://github.com/kakao/s2graph. The following links provide more
> >> information about S2Graph in open source:
> >>  * S2Graph web site:
> https://steamshon.gitbooks.io/s2graph-book/content/
> >>  * Codebase at Github: https://github.com/kakao/s2graph
> >>  * Issue Tracking: https://github.com/kakao/s2graph/issues
> >>  * User community: https://groups.google.com/forum/#!forum/s2graph
> >>
> >> == Initial Source ==
> >>
> >> The S2Graph codebase is currently hosted on Github:
> >> https://github.com/kakao/s2graph
> >>
> >> === Source and Intellectual Property Submission Plan ===
> >>
> >> Currently, the S2Graph codebase is distributed under the Apache 2.0
> >> License.
> >>
> >> == External Dependencies ==
> >>
> >> Beyond relying on Apache HBase, Phoenix has the following external
> >> dependencies:
> >>  * Asynchbase (BSD license: http://www.antlr3.org/license.html)
> >>  * Mysql (BSD license:
> >> https://github.com/julianhyde/sqlline/blob/master/LICENSE)
> >>  * Play Framework (Apache 2.0 license:
> >> https://github.com/playframework/playframework)
> >>  * Scala (https://github.com/scala/scala)
> >>  * Spark
> >>  * Kafka
> >>
> >> == Required Resources ==
> >>
> >> === Mailing list ===
> >>
> >> We will migrate our mailing lists to the following:
> >>  * users@s2graph.incubator.apache.org
> >>  * dev@s2graph.incubator.apache.org
> >>  * private@s2graph.incubator.apache.org
> >>  * commits@s2graph.incubator.apache.org
> >>
> >> === Source control ===
> >>
> >> The S2Graph team would like to use Git for source control, due to our
> >> current use of Git. We request a writeable Git repo for S2Graph, and
> >> mirroring to be set up to Github through INFRA.
> >>
> >> === Issue Tracking ===
> >>
> >> S2Graph currently uses the github issue tracking system associated
> >> with its github repo: https://github.com/kakao/s2graph/issues. We will
> >> migrate to the Apache JIRA:
> >> http://issues.apache.org/jira/browse/S2Graph
> >>
> >> === Other Resources ===
> >>
> >> Jenkins/Hudson for builds and test running.
> >> Wiki for documentation purposes
> >> Blog to improve project dissemination
> >>
> >> == Initial Committers ==
> >>
> >>  * Doyung Yoon <shom83 at gmail.com>
> >>  * Daewon Jeong <blueiur at gmail.com>
> >>  * Jaesang Kim <honeysleep at gmail.com>
> >>  * Hwansung Yu <deejayfwan at gmail.com>
> >>  * Min-Seok Kim <mskim.org at gmail.com>
> >>  * Chul Kang <miralchul at gmail.com>
> >>
> >> == Affiliations ==
> >>
> >> The initial committers are from one organizations: Kakao.
> >>  * Doyung Yoon, Kakao
> >>  * Daewon Jeong, Kakao
> >>  * Jaesang Kim, Kakao
> >>  * Hwansung Yu, Kakao
> >>  * Min-Seok Kim, Kakao
> >>  * Chul Kang, Kakao
> >>
> >> == Sponsors ==
> >>
> >> === Champion ===
> >> Hyunsik Choi
> >>
> >> === Nominated Mentors ===
> >>
> >> === Sponsoring Entity ===
> >>
> >>  * The Apache Incubator
> >>
> >> On Fri, Nov 6, 2015 at 4:05 PM, Hyunsik Choi <hyunsik@apache.org>
> wrote:
> >> > Hi Seetharam,
> >> >
> >> > Thank you for a good question. That seem to be a frequent question to
> >> > this project.
> >> >
> >> > Here is the answer to your question.
> >> >
> >>
> https://steamshon.gitbooks.io/s2graph-book/content/what_is_different_to_titan.html
> >> >
> >> > I hope that this link is helpful to your understanding.
> >> >
> >> > Best regards,
> >> > Hyunsik
> >> >
> >> >
> >> >
> >> > On Fri, Nov 6, 2015 at 3:07 PM, Seetharam Venkatesh
> >> > <venkatesh@innerzeal.com> wrote:
> >> >> Hi Hyunsik,
> >> >>
> >> >> The proposal looks interesting and want to know how is this different
> >> from
> >> >> existing solutions in the same space such as Titan, etc.
> >> >>
> >> >> Thanks!
> >> >> Venkatesh
> >> >>
> >> >>
> >> >> On Fri, Nov 6, 2015 at 1:36 PM Hyunsik Choi <hyunsik@apache.org>
> wrote:
> >> >>
> >> >>> Hi folks,
> >> >>>
> >> >>> We would like to start a discussion on S2Graph as an incubation
> >> project.
> >> >>>
> >> >>> S2Graph is a distributed and scalable OLTP graph database built
on
> >> >>> HBase. It provides interactive queries for vertex/edge/sub-graphs
on
> >> >>> extremely large graph data sets as well as insertion and update
> >> >>> operations.
> >> >>>
> >> >>> S2Graph was already introduced in Apache BigData and HBaseCon this
> >> year.
> >> >>>
> >> >>> The proposal is available at :
> >> >>> https://wiki.apache.org/incubator/S2GraphProposal
> >> >>>
> >> >>> We are looking forward to any feedback. In addition, we are looking
> >> >>> for volunteers as mentors.
> >> >>>
> >> >>> Best regards,
> >> >>> Hyunsik
> >> >>>
> >> >>>
> ---------------------------------------------------------------------
> >> >>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> >>> For additional commands, e-mail: general-help@incubator.apache.org
> >> >>>
> >> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >> For additional commands, e-mail: general-help@incubator.apache.org
> >>
> >>
> >
> >
> > --
> > Sergio Fernández
> > Partner Technology Manager
> > Redlink GmbH
> > m: +43 6602747925
> > e: sergio.fernandez@redlink.co
> > w: http://redlink.co
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message