flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vasiliki Kalavri <vasilikikala...@gmail.com>
Subject Re: Apache Tinkerpop & Geode Integration?
Date Wed, 16 Dec 2015 18:54:25 GMT
Hey,

I think I might have confused you, so let me try to explain :)

First, Gremlin is a language similar to Cypher, but it is also a traversal
machine, which also supports distributed traversals. For distributed
traversals, Gremlin uses a "graph computer", which runs the Gremlin
traversals using the BSP model. Essentially, vertices receive traversers as
messages and execute the traverser's step as the update function (for more
info see section 5 in [1]).

Thus, Tinkerpop has a GiraphGraphComputer to run on top of Giraph, a
SparkGraphComputer to run on top of Spark, etc.

The Tinkerpop community has offered to work on a FlinkGraphComputer, which,
similarly to the existing graph computers, will use one of the Flink/Gelly
iteration abstractions.

Now, there are 2 questions for the Flink community:
(1): do we think this is interesting/useful and something we can help them
with?
(2): do we think it makes sense to "host" the FlinkGraphComputer on the
Flink codebase?


Neo4j/Cypher on Flink is a separate discussion in my opinion. As far as I
understand, Cypher could run on Gremlin, but there is no compiler for it
yet. I have been discussing with people from Neo4j and we have jointly
written a description for a thesis project regarding OpenCypher on Flink.
The idea is to collaboratively supervise/help the student(s). Of course, if
anyone else is interested in this (not necessarily a student) we can always
use more help, so just let me know!

Thanks,
-Vasia.

​[1]: ​
http://arxiv.org/pdf/1508.03843v1.pdf


On 16 December 2015 at 19:21, Stephan Ewen <sewen@apache.org> wrote:

> I am not very familiar with Gremlin, but I remember a brainstorming session
> with Martin Neumann on porting Cypher (the neo4j query language) to Flink.
> We looked at Cypher queries for filtering and traversing the graph.
>
> It looked like it would work well. We remember we could even model
> recursive conditions on traversals pretty well with delta iterations.
>
> If Gremlin's use cases are anything like Cypher, I could ping Martin and
> see if we can collect again some of those ideas.
>
> Stephan
>
>
>
> On Tue, Dec 15, 2015 at 5:35 PM, Vasiliki Kalavri <
> vasilikikalavri@gmail.com
> > wrote:
>
> > Hi Dr. Fabian,
> >
> > thanks a lot for your answer!
> >
> >
> > On 15 December 2015 at 15:42, Fabian Hueske <fhueske@gmail.com> wrote:
> >
> > > Hi Vasia,
> > >
> > > I agree, Gremlin definitely looks like an interesting API for Flink.
> > > I'm not sure how it relates to Gelly. I guess Gelly would (initially)
> be
> > > more tightly integrated with the DataSet API whereas Gremlin would be a
> > > connector for other languages. Any ideas on this?
> > >
> >
> > The idea is to provide a FlinkGraphComputer which will use Gelly's
> > iterations to compile the Gremlin query language to Flink.
> > In my previous email, I linked to our discussion over at the Tinkerpop
> > mailing list, where you can find more details on this. By adding the
> > FlinkGraphComputer, we basically get any graph query language that
> compiles
> > to the Gremlin VM for free.
> >
> >
> > >
> > > Another question would be whether the connector should to into Flink or
> > > Tinkerpop. For example, the Spark, Giraph, and Neo4J connectors are all
> > > included in Tinkerpop.
> > > This should be discussed with the Tinkerpop community.
> > >
> > >
> > I'm copying from the Tinkerpop mailing list thread (link for full thread
> in
> > my previous email):​
> >
> >
> > *In the past, TinkerPop use to be a "dumping ground" for all
> > implementations, but we decided for TinkerPop3 that we would only have
> > "reference implementations" so users can play, system providers can
> learn,
> > and ultimately, system providers would provide TinkerPop support in their
> > distribution. As such, we would like to have FlinkGraphComputer
> distributed
> > with Flink. If that sounds like something your project would be
> comfortable
> > with, I think we can provide a JIRA/PR for FlinkGraphComputer (as well as
> > any necessary documentation). We can start with a JIRA ticket to get
> things
> > going. Thoughts?*
> >
> >
> > ​This is why I brought the conversation over here, so I hear the opinions
> > of the Flink community on this :)​
> >
> >
> >
> > > Best, Fabian
> > >
> >
> >
> > -Vasia.​
> >
> >
> >
> > >
> > >
> > > 2015-12-14 18:33 GMT+01:00 Vasiliki Kalavri <vasilikikalavri@gmail.com
> >:
> > >
> > > > Ping squirrels! Any thoughts/opinions on this?
> > > >
> > > > On 9 December 2015 at 20:40, Vasiliki Kalavri <
> > vasilikikalavri@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hello squirrels,
> > > > >
> > > > > I have been discussing with the Apache Tinkerpop [1] community
> > > regarding
> > > > > an integration with Flink/Gelly.
> > > > > You can read our discussion in [2].
> > > > >
> > > > > Tinkerpop has a graph traversal machine called Gremlin, which
> > supports
> > > > > many high-level graph processing languages and runs on top of
> > different
> > > > > systems (e.g. Giraph, Spark, Graph DBs). You can read more in this
> > > great
> > > > > blog post [3].
> > > > >
> > > > > The idea is to provide a FlinkGraphComputer implementation, which
> > will
> > > > add
> > > > > Gremlin support to Flink.
> > > > >
> > > > > I believe Tinkerpop is a great project and I would love to see an
> > > > > integration with Gelly.
> > > > > Before we move forward, I would like your input!
> > > > > To me, it seems that this addition would nicely fit in
> flink-contrib,
> > > > > where we also have connectors to other projects.
> > > > > If you agree, I will go ahead and open a JIRA about it.
> > > > >
> > > > > Thank you!
> > > > > -Vasia.
> > > > >
> > > > > [1]: https://tinkerpop.incubator.apache.org/
> > > > > [2]:
> > > > >
> > > >
> > >
> >
> https://mail-archives.apache.org/mod_mbox/incubator-tinkerpop-dev/201511.mbox/%3CCANva_A390L7g169r8Sn+ej1-yfKBUdLnd4Td6ATwnP0uzA--gA@mail.gmail.com%3E
> > > > > [3]:
> > > > >
> > > >
> > >
> >
> http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
> > > > >
> > > > > On 25 November 2015 at 16:54, Vasiliki Kalavri <
> > > > vasilikikalavri@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi James,
> > > > >>
> > > > >> I've just subscribed to the Tinkerpop dev mailing list. Could
you
> > > please
> > > > >> send a reply to the thread, so then I can reply to it?
> > > > >> I'm not sure how I can reply to the thread otherwise...
> > > > >> I also saw that there is a grafos.ml project thread. I could
also
> > > > >> provide some input there :)
> > > > >>
> > > > >> Thanks!
> > > > >> -Vasia.
> > > > >>
> > > > >> On 25 November 2015 at 15:09, James Thornton <
> > > james.thornton@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Hi Vasia -
> > > > >>>
> > > > >>> Yes, a FlinkGraphComputer should be a straight-forward first
> step.
> > > > Also,
> > > > >>> on
> > > > >>> the Apache Tinkerpop dev mailing list, Marko thought it might
be
> > cool
> > > > if
> > > > >>> there was a "Graph API" similar to the "Table API" -- hooking
in
> > > > Gremlin
> > > > >>> to
> > > > >>> Flink's fluent API would give Flink users a full graph query
> > > language.
> > > > >>>
> > > > >>> Stephen Mallette is a TinkerPop core contributor, and he
has
> > already
> > > > >>> started working on a FlinkGraphComputer. There is a
> Flink/Tinkerpop
> > > > >>> thread
> > > > >>> on the TinkerPop dev list -- it would be great to have you
part
> of
> > > the
> > > > >>> conversation there too as we work on the integration:
> > > > >>>
> > > > >>>
> > http://mail-archives.apache.org/mod_mbox/incubator-tinkerpop-dev/
> > > > >>>
> > > > >>> Thanks, Vasia.
> > > > >>>
> > > > >>> - James
> > > > >>>
> > > > >>>
> > > > >>> On Mon, Nov 23, 2015 at 10:28 AM, Vasiliki Kalavri <
> > > > >>> vasilikikalavri@gmail.com> wrote:
> > > > >>>
> > > > >>> > Hi James,
> > > > >>> >
> > > > >>> > thank you for your e-mail and your interest in Flink
:)
> > > > >>> >
> > > > >>> > I've recently taken a _quick_ look into Apache TinkerPop
and I
> > > think
> > > > >>> it'd
> > > > >>> > be very interesting to integrate with Flink/Gelly.
> > > > >>> > Are you thinking about something like a Flink GraphComputer,
> > > similar
> > > > to
> > > > >>> > Giraph and Spark GraphComputer's?
> > > > >>> > I believe such an integration should be straight-forward
to
> > > > implement.
> > > > >>> You
> > > > >>> > can start by looking into Flink iteration operators
[1] and
> Gelly
> > > > >>> iteration
> > > > >>> > abstractions [2].
> > > > >>> >
> > > > >>> > Regarding Apache Geode, I'm not familiar with project,
but I'll
> > try
> > > > to
> > > > >>> take
> > > > >>> > a look in the following days!
> > > > >>> >
> > > > >>> > Cheers,
> > > > >>> > -Vasia.
> > > > >>> >
> > > > >>> >
> > > > >>> > [1]:
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators
> > > > >>> > [2]:
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/libs/gelly_guide.html#iterative-graph-processing
> > > > >>> >
> > > > >>> >
> > > > >>> > On 20 November 2015 at 08:32, James Thornton <
> > > > james.thornton@gmail.com
> > > > >>> >
> > > > >>> > wrote:
> > > > >>> >
> > > > >>> > > Hi -
> > > > >>> > >
> > > > >>> > > This is James Thornton (espeed) from the Apache
Tinkerpop
> > > project (
> > > > >>> > > http://tinkerpop.incubator.apache.org/).
> > > > >>> > >
> > > > >>> > > The Flink iterators should pair well with Gremlin's
Graph
> > > Traversal
> > > > >>> > Machine
> > > > >>> > > (
> > > > >>> > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> http://www.datastax.com/dev/blog/the-benefits-of-the-gremlin-graph-traversal-machine
> > > > >>> > > )
> > > > >>> > > -- it would be good to coordinate on creating an
integration.
> > > > >>> > >
> > > > >>> > > Also, Apache Geode made a splash today on HN (
> > > > >>> > > https://news.ycombinator.com/item?id=10596859)
-- connecting
> > > Geode
> > > > >>> and
> > > > >>> > > Flink would be killer. Here's the Geode/Spark connector
for
> > > > >>> refefference:
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> https://github.com/apache/incubator-geode/tree/develop/gemfire-spark-connector
> > > > >>> > >
> > > > >>> > > - James
> > > > >>> > >
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message