Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D85DC10FFC for ; Sat, 21 Mar 2015 18:48:56 +0000 (UTC) Received: (qmail 97394 invoked by uid 500); 21 Mar 2015 18:48:56 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 97324 invoked by uid 500); 21 Mar 2015 18:48:56 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 97311 invoked by uid 99); 21 Mar 2015 18:48:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Mar 2015 18:48:56 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ewenstephan@gmail.com designates 209.85.213.172 as permitted sender) Received: from [209.85.213.172] (HELO mail-ig0-f172.google.com) (209.85.213.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 21 Mar 2015 18:48:52 +0000 Received: by igcau2 with SMTP id au2so10105899igc.1 for ; Sat, 21 Mar 2015 11:48:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=O53VL21Xjq5J6rzp1IuNaW7AT/XVQBg4RJj8Za8oTMg=; b=RnN3E+P/5pB78Q64v6quWe4sXHk8XRDsuarqAM9EJDkhkwe2UvAAqvkZnAmxumAOVR XlyPceHINbvNFpStIcKmuuQGgEq8Fx7TJSZ75hvpC8g7xbOhKxq9GhgaZmXxSRk3MO5M 9gKQBF7qn7kzLr+LoxEic8z9L1BzxBfzEbjx4qxcSu9Z/s7WzqvTFmx/8cfHUKpC0gXE pU8O6+lc4+MR5rba1RHYu8MuTjii488gH6oMEvfjIgk68bGgiyJzZXeklaLXtbmbXXDs ukMoKZH8YmK4ge/L5kw6Yl95Ce1UJKKcgAOWmQe0wERctr8fE3Sey7d2jK55WCHC5Mc2 +0Cg== MIME-Version: 1.0 X-Received: by 10.50.136.226 with SMTP id qd2mr4622909igb.26.1426963711950; Sat, 21 Mar 2015 11:48:31 -0700 (PDT) Sender: ewenstephan@gmail.com Received: by 10.64.76.130 with HTTP; Sat, 21 Mar 2015 11:48:31 -0700 (PDT) In-Reply-To: References: <1425052082233-4130.post@n3.nabble.com> Date: Sat, 21 Mar 2015 19:48:31 +0100 X-Google-Sender-Auth: 15drd5CChZYADDdybiqVYPv23b4 Message-ID: Subject: Re: Queries regarding RDFs with Flink From: Stephan Ewen To: "dev@flink.apache.org" Content-Type: multipart/alternative; boundary=089e013cba0428865c0511d0e1f4 X-Virus-Checked: Checked by ClamAV on apache.org --089e013cba0428865c0511d0e1f4 Content-Type: text/plain; charset=UTF-8 Hi Flavio! I see initially two ways of doing this: 1) Do a series of joins. You start with your subject and join two or three times using the "objects-from-triplets == subject" to make one hop. You can filter the verbs from the triplets before if you are only interested in a special relationship. 2) If you want to recursively explode the subgraph (something like all reachable subjects) or do a rather long series of hops, then you should be able to model this nicely as a delta iterations, or as a vertex-centric graph computation. For that, you can use both "Gelly" (the graph library) or the standalone Spargel operator (Giraph-like). Does that help with your questions? Greetings, Stephan On Thu, Mar 19, 2015 at 2:57 PM, Flavio Pompermaier wrote: > Hi to all, > I'm back to this task again :) > > Summarizing again: I have some source dataset that has contains RDF "stars" > (SubjectURI, RdfType and a list of RDF triples belonging to this subject -> > the "a.k.a." star schema) > and I have to extract some sub-graphs for some RDF types of interest. > As described in the previous email I'd like to expand some root node (if > its type is of interest) and explode some of its path(s). > For example, if I'm interested in the expansion of rdf type Person (as in > the example), I could want to create a mini-graph with all of its triples > plus those obtained exploding the path(s) > knows.marriedWith and knows.knows.knows. > At the moment I do it with a punctual get from HBase but I didn't > get whether this could be done more efficiently with other strategies in > Flink. > @Vasiliki: you said that I could need "something like a BFS from each > vertex". Do you have an example that could fit my use case? Is it possible > to filter out those vertices I'm interested in? > > Thanks in advance, > Flavio > > > On Tue, Mar 3, 2015 at 8:32 PM, Vasiliki Kalavri < > vasilikikalavri@gmail.com> > wrote: > > > Hi Flavio, > > > > if you want to use Gelly to model your data as a graph, you can load your > > Tuple3s as Edges. > > This will result in "http://test/John", "Person", "Frank", etc to be > > vertices and "type", "name", "knows" to be edge values. > > In the first case, you can use filterOnEdges() to get the subgraph with > the > > relation edges. > > > > Once you have the graph, you could probably use a vertex-centric > iteration > > to generate the trees. > > It seems to me that you need something like a BFS from each vertex. Keep > in > > mind that this can be a very costly operation in terms of memory and > > communication for large graphs. > > > > Let me know if you have any questions! > > > > Cheers, > > V. > > > > On 3 March 2015 at 09:13, Flavio Pompermaier > wrote: > > > > > I have a nice case of RDF manipulation :) > > > Let's say I have the following RDF triples (Tuple3) in two files or > > tables: > > > > > > TABLE A: > > > http://test/John, type, Person > > > http://test/John, name, John > > > http://test/John, knows, http://test/Mary > > > http://test/John, knows, http://test/Jerry > > > http://test/Jerry, type, Person > > > http://test/Jerry, name, Jerry > > > http://test/Jerry, knows, http://test/Frank > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > > > > TABLE B: > > > http://test/Frank, type, Person > > > http://test/Frank, name, Frank > > > http://test/Frank, marriedWith, http://test/Mary > > > > > > What is the best way to build up Person-rooted trees with all node's > data > > > properties and some expanded path like 'Person.knows.marriedWith' ? > > > Is it better to use Graph/Gelly APIs, Flink Joins, multiple punctuals > get > > > from a Key/value store or what? > > > > > > The expected 4 trees should be: > > > > > > tree 1 (root is John) ------------------ > > > http://test/John, type, Person > > > http://test/John, name, John > > > http://test/John, knows, http://test/Mary > > > http://test/John, knows, http://test/Jerry > > > http://test/Jerry, type, Person > > > http://test/Jerry, name, Jerry > > > http://test/Jerry, knows, http://test/Frank > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > http://test/Frank, type, Person > > > http://test/Frank, name, Frank > > > http://test/Frank, marriedWith, http://test/Mary > > > > > > tree 2 (root is Jerry) ------------------ > > > http://test/Jerry, type, Person > > > http://test/Jerry, name, Jerry > > > http://test/Jerry, knows, http://test/Frank > > > http://test/Frank, type, Person > > > http://test/Frank, name, Frank > > > http://test/Frank, marriedWith, http://test/Mary > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > > > > tree 3 (root is Mary) ------------------ > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > > > > tree 4 (root is Frank) ------------------ > > > http://test/Frank, type, Person > > > http://test/Frank, name, Frank > > > http://test/Frank, marriedWith, http://test/Mary > > > http://test/Mary, type, Person > > > http://test/Mary, name, Mary > > > > > > Thanks in advance, > > > Flavio > > > > > > On Mon, Mar 2, 2015 at 5:04 PM, Stephan Ewen wrote: > > > > > > > Hey Santosh! > > > > > > > > RDF processing often involves either joins, or graph-query like > > > operations > > > > (transitive). Flink is fairly good at both types of operations. > > > > > > > > I would look into the graph examples and the graph API for a start: > > > > > > > > - Graph examples: > > > > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/graph > > > > - Graph API: > > > > > > > > > > > > > > https://github.com/apache/flink/tree/master/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph > > > > > > > > If you have a more specific question, I can give you better pointers > > ;-) > > > > > > > > Stephan > > > > > > > > > > > > On Fri, Feb 27, 2015 at 4:48 PM, santosh_rajaguru > > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > how can flink be useful for processing the data to RDFs and build > the > > > > > ontology? > > > > > > > > > > Regards, > > > > > Santosh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > View this message in context: > > > > > > > > > > > > > > > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Queries-regarding-RDFs-with-Flink-tp4130.html > > > > > Sent from the Apache Flink (Incubator) Mailing List archive. > mailing > > > list > > > > > archive at Nabble.com. > > > > > > > > > > > > > > > --089e013cba0428865c0511d0e1f4--