flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Queries regarding RDFs with Flink
Date Sat, 21 Mar 2015 18:48:31 GMT
Hi Flavio!

I see initially two ways of doing this:

1) Do a series of joins. You start with your subject and join two or three
times using the "objects-from-triplets == subject" to make one hop. You can
filter the verbs from the triplets before if you are only interested in a
special relationship.

2) If you want to recursively explode the subgraph (something like all
reachable subjects) or do a rather long series of hops, then you should be
able to model this nicely as a delta iterations, or as a vertex-centric
graph computation. For that, you can use both "Gelly" (the graph library)
or the standalone Spargel operator (Giraph-like).

Does that help with your questions?

Greetings,
Stephan


On Thu, Mar 19, 2015 at 2:57 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Hi to all,
> I'm back to this task again :)
>
> Summarizing again: I have some source dataset that has contains RDF "stars"
> (SubjectURI, RdfType and a list of RDF triples belonging to this subject ->
> the "a.k.a." star schema)
> and I have to extract some sub-graphs for some RDF types of interest.
> As described in the previous email I'd like to expand some root node (if
> its type is of interest) and explode some of its path(s).
> For example, if I'm interested in the expansion of rdf type Person (as in
> the example), I could want to create a mini-graph with all of its triples
> plus those obtained exploding the path(s)
> knows.marriedWith and knows.knows.knows.
> At the moment I do it with a punctual get from HBase but I didn't
> get whether this could be done more efficiently with other strategies in
> Flink.
> @Vasiliki: you said that I could need "something like a BFS from each
> vertex".  Do you have an example that could fit my use case? Is it possible
> to filter out those vertices I'm interested in?
>
> Thanks in advance,
> Flavio
>
>
> On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri <
> vasilikikalavri@gmail.com>
> wrote:
>
> > Hi Flavio,
> >
> > if you want to use Gelly to model your data as a graph, you can load your
> > Tuple3s as Edges.
> > This will result in "http://test/John", "Person", "Frank", etc to be
> > vertices and "type", "name", "knows" to be edge values.
> > In the first case, you can use filterOnEdges() to get the subgraph with
> the
> > relation edges.
> >
> > Once you have the graph, you could probably use a vertex-centric
> iteration
> > to generate the trees.
> > It seems to me that you need something like a BFS from each vertex. Keep
> in
> > mind that this can be a very costly operation in terms of memory and
> > communication for large graphs.
> >
> > Let me know if you have any questions!
> >
> > Cheers,
> > V.
> >
> > On 3 March 2015 at 09:13, Flavio Pompermaier <pompermaier@okkam.it>
> wrote:
> >
> > > I have a nice case of RDF manipulation :)
> > > Let's say I have the following RDF triples (Tuple3) in two files or
> > tables:
> > >
> > > TABLE A:
> > > http://test/John, type, Person
> > > http://test/John, name, John
> > > http://test/John, knows, http://test/Mary
> > > http://test/John, knows, http://test/Jerry
> > > http://test/Jerry, type, Person
> > > http://test/Jerry, name, Jerry
> > > http://test/Jerry, knows, http://test/Frank
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > >
> > > TABLE B:
> > > http://test/Frank, type, Person
> > > http://test/Frank, name, Frank
> > > http://test/Frank, marriedWith, http://test/Mary
> > >
> > > What is the best way to build up Person-rooted trees with all node's
> data
> > > properties and some expanded path like 'Person.knows.marriedWith' ?
> > > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals
> get
> > > from a Key/value store or what?
> > >
> > > The expected 4 trees should be:
> > >
> > > tree 1 (root is John) ------------------
> > > http://test/John, type, Person
> > > http://test/John, name, John
> > > http://test/John, knows, http://test/Mary
> > > http://test/John, knows, http://test/Jerry
> > > http://test/Jerry, type, Person
> > > http://test/Jerry, name, Jerry
> > > http://test/Jerry, knows, http://test/Frank
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > > http://test/Frank, type, Person
> > > http://test/Frank, name, Frank
> > > http://test/Frank, marriedWith, http://test/Mary
> > >
> > > tree 2 (root is Jerry) ------------------
> > > http://test/Jerry, type, Person
> > > http://test/Jerry, name, Jerry
> > > http://test/Jerry, knows, http://test/Frank
> > > http://test/Frank, type, Person
> > > http://test/Frank, name, Frank
> > > http://test/Frank, marriedWith, http://test/Mary
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > >
> > > tree 3 (root is Mary) ------------------
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > >
> > > tree 4 (root is Frank) ------------------
> > > http://test/Frank, type, Person
> > > http://test/Frank, name, Frank
> > > http://test/Frank, marriedWith, http://test/Mary
> > > http://test/Mary, type, Person
> > > http://test/Mary, name, Mary
> > >
> > > Thanks in advance,
> > > Flavio
> > >
> > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <sewen@apache.org> wrote:
> > >
> > > > Hey Santosh!
> > > >
> > > > RDF processing often involves either joins, or graph-query like
> > > operations
> > > > (transitive). Flink is fairly good at both types of operations.
> > > >
> > > > I would look into the graph examples and the graph API for a start:
> > > >
> > > >  - Graph examples:
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
> > > >  - Graph API:
> > > >
> > > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
> > > >
> > > > If you have a more specific question, I can give you better pointers
> > ;-)
> > > >
> > > > Stephan
> > > >
> > > >
> > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <sanit4u@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > how can flink be useful for processing the data to RDFs and build
> the
> > > > > ontology?
> > > > >
> > > > > Regards,
> > > > > Santosh
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > View this message in context:
> > > > >
> > > >
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> > > > > Sent from the Apache Flink (Incubator) Mailing List archive.
> mailing
> > > list
> > > > > archive at Nabble.com.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message