flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: Queries regarding RDFs with Flink
Date Thu, 19 Mar 2015 13:57:42 GMT
Hi to all,
I'm back to this task again :)

Summarizing again: I have some source dataset that has contains RDF "stars"
(SubjectURI, RdfType and a list of RDF triples belonging to this subject ->
the "a.k.a." star schema)
and I have to extract some sub-graphs for some RDF types of interest.
As described in the previous email I'd like to expand some root node (if
its type is of interest) and explode some of its path(s).
For example, if I'm interested in the expansion of rdf type Person (as in
the example), I could want to create a mini-graph with all of its triples
plus those obtained exploding the path(s)
knows.marriedWith and knows.knows.knows.
At the moment I do it with a punctual get from HBase but I didn't
get whether this could be done more efficiently with other strategies in
Flink.
@Vasiliki: you said that I could need "something like a BFS from each
vertex".  Do you have an example that could fit my use case? Is it possible
to filter out those vertices I'm interested in?

Thanks in advance,
Flavio


On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri <vasilikikalavri@gmail.com>
wrote:

> Hi Flavio,
>
> if you want to use Gelly to model your data as a graph, you can load your
> Tuple3s as Edges.
> This will result in "http://test/John", "Person", "Frank", etc to be
> vertices and "type", "name", "knows" to be edge values.
> In the first case, you can use filterOnEdges() to get the subgraph with the
> relation edges.
>
> Once you have the graph, you could probably use a vertex-centric iteration
> to generate the trees.
> It seems to me that you need something like a BFS from each vertex. Keep in
> mind that this can be a very costly operation in terms of memory and
> communication for large graphs.
>
> Let me know if you have any questions!
>
> Cheers,
> V.
>
> On 3 March 2015 at 09:13, Flavio Pompermaier <pompermaier@okkam.it> wrote:
>
> > I have a nice case of RDF manipulation :)
> > Let's say I have the following RDF triples (Tuple3) in two files or
> tables:
> >
> > TABLE A:
> > http://test/John, type, Person
> > http://test/John, name, John
> > http://test/John, knows, http://test/Mary
> > http://test/John, knows, http://test/Jerry
> > http://test/Jerry, type, Person
> > http://test/Jerry, name, Jerry
> > http://test/Jerry, knows, http://test/Frank
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> >
> > TABLE B:
> > http://test/Frank, type, Person
> > http://test/Frank, name, Frank
> > http://test/Frank, marriedWith, http://test/Mary
> >
> > What is the best way to build up Person-rooted trees with all node's data
> > properties and some expanded path like 'Person.knows.marriedWith' ?
> > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals get
> > from a Key/value store or what?
> >
> > The expected 4 trees should be:
> >
> > tree 1 (root is John) ------------------
> > http://test/John, type, Person
> > http://test/John, name, John
> > http://test/John, knows, http://test/Mary
> > http://test/John, knows, http://test/Jerry
> > http://test/Jerry, type, Person
> > http://test/Jerry, name, Jerry
> > http://test/Jerry, knows, http://test/Frank
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> > http://test/Frank, type, Person
> > http://test/Frank, name, Frank
> > http://test/Frank, marriedWith, http://test/Mary
> >
> > tree 2 (root is Jerry) ------------------
> > http://test/Jerry, type, Person
> > http://test/Jerry, name, Jerry
> > http://test/Jerry, knows, http://test/Frank
> > http://test/Frank, type, Person
> > http://test/Frank, name, Frank
> > http://test/Frank, marriedWith, http://test/Mary
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> >
> > tree 3 (root is Mary) ------------------
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> >
> > tree 4 (root is Frank) ------------------
> > http://test/Frank, type, Person
> > http://test/Frank, name, Frank
> > http://test/Frank, marriedWith, http://test/Mary
> > http://test/Mary, type, Person
> > http://test/Mary, name, Mary
> >
> > Thanks in advance,
> > Flavio
> >
> > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen <sewen@apache.org> wrote:
> >
> > > Hey Santosh!
> > >
> > > RDF processing often involves either joins, or graph-query like
> > operations
> > > (transitive). Flink is fairly good at both types of operations.
> > >
> > > I would look into the graph examples and the graph API for a start:
> > >
> > >  - Graph examples:
> > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph
> > >  - Graph API:
> > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph
> > >
> > > If you have a more specific question, I can give you better pointers
> ;-)
> > >
> > > Stephan
> > >
> > >
> > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru <sanit4u@gmail.com>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > how can flink be useful for processing the data to RDFs and build the
> > > > ontology?
> > > >
> > > > Regards,
> > > > Santosh
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html
> > > > Sent from the Apache Flink (Incubator) Mailing List archive. mailing
> > list
> > > > archive at Nabble.com.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message