giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Date Tue, 10 Apr 2012 22:18:36 GMT
I think the issue might be that Hadoop only logs INFO and above messages 
by default.  Can you retry with INFO level logging?

Avery

On 4/10/12 12:17 PM, Paolo Castagna wrote:
> Hi,
> I am still learning Giraph, so, please, be patient with me and forgive my
> trivial questions.
>
> As a simple initial use case, I want to compute the shortest paths from a single
> source in a social graph in RDF format using the FOAF [1] vocabulary.
> This example also will hopefully inform GIRAPH-170 [2] and related issues, such
> as: GIRAPH-141 [3].
>
> Here is an example in Turtle [4] format of a tiny graph using FOAF:
> ----
> @prefix :<http://example.org/>  .
> @prefix foaf:<http://xmlns.com/foaf/0.1/>  .
>
> :alice
>      a           foaf:Person ;
>      foaf:name   "Alice" ;
>      foaf:mbox<mailto:alice@example.org>  ;
>      foaf:knows  :bob ;
>      foaf:knows  :charlie ;
>      foaf:knows  :snoopy ;
>      .
>
> :bob
>      foaf:name   "Bob" ;
>      foaf:knows  :charlie ;
>      .
>
> :charlie
>      foaf:name   "Charlie" ;
>      foaf:knows  :alice ;
>      .
> ----
> This is nice, human friendly (RDF without angle brackets!), but not easily
> splittable to be processed with MapReduce (or Giraph).
>
> Here is the same graph in N-Triples [5] format:
> ----
> <http://example.org/alice>  <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://xmlns.com/foaf/0.1/Person>  .
> <http://example.org/alice>  <http://xmlns.com/foaf/0.1/name>  "Alice" .
> <http://example.org/alice>  <http://xmlns.com/foaf/0.1/mbox>
> <mailto:alice@example.org>  .
> <http://example.org/alice>  <http://xmlns.com/foaf/0.1/knows>
> <http://example.org/bob>  .
> <http://example.org/alice>  <http://xmlns.com/foaf/0.1/knows>
> <http://example.org/charlie>  .
> <http://example.org/alice>  <http://xmlns.com/foaf/0.1/knows>
> <http://example.org/snoopy>  .
> <http://example.org/charlie>  <http://xmlns.com/foaf/0.1/name>  "Charlie"
.
> <http://example.org/charlie>  <http://xmlns.com/foaf/0.1/knows>
> <http://example.org/alice>  .
> <http://example.org/bob>  <http://xmlns.com/foaf/0.1/name>  "Bob" .
> <http://example.org/bob>  <http://xmlns.com/foaf/0.1/knows>
> <http://example.org/charlie>  .
> ----
> This is more verbose and ugly, but splittable.
>
> The graph I am interested in is the graph represented by foaf:knows
> relationships/links between people (please, note --knows-->  relationship here
> has a direction, this isn't symmetric as in centralized social networking
> websites such as Facebook or LinkedIn. Alice can claim to know Bob, without Bob
> knowing it and/or it might even be a false claim):
>
> alice --knows-->  bob
> alice --knows-->  charlie
> alice --knows-->  snoopy
> bob --knows-->  charlie
> charlie --knows-->  alice
>
> As a first step, I wrote a MapReduce job [6] to transform the RDF graph above in
> a sort of adjacency list using Turtle syntax, here is the output (three lines):
> ----
> <http://example.org/alice>  <http://xmlns.com/foaf/0.1/mbox>
> <mailto:alice@example.org>;<http://xmlns.com/foaf/0.1/name>  "Alice";
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
> <http://xmlns.com/foaf/0.1/Person>;<http://xmlns.com/foaf/0.1/knows>
> <http://example.org/charlie>,<http://example.org/bob>,
> <http://example.org/snoopy>; .<http://example.org/charlie>
> <http://xmlns.com/foaf/0.1/knows>  <http://example.org/alice>.
>
> <http://example.org/bob>  <http://xmlns.com/foaf/0.1/name>  "Bob";
> <http://xmlns.com/foaf/0.1/knows>  <http://example.org/charlie>; .
> <http://example.org/alice>  <http://xmlns.com/foaf/0.1/knows>
> <http://example.org/bob>.
>
> <http://example.org/charlie>  <http://xmlns.com/foaf/0.1/name>  "Charlie";
> <http://xmlns.com/foaf/0.1/knows>  <http://example.org/alice>; .
> <http://example.org/bob>  <http://xmlns.com/foaf/0.1/knows>
> <http://example.org/charlie>.<http://example.org/alice>
> <http://xmlns.com/foaf/0.1/knows>  <http://example.org/charlie>.
> ----
> This is legal Turtle, but it is also splittable. Each line has all the RDF
> statements (i.e. egdes) for a person (there are also incoming edges).
>
> I wrote a TurtleVertexReader [7] which extends TextVertexReader<NodeWritable,
> Text, NodeWritable, Text>  and a TurtleVertexInputFormat [8] which extends
> TextVertexInputFormat<NodeWritable, Text, NodeWritable, Text>.
> I wrote (copying from the example SimpleShortestPathsVertex) a
> FoafShortestPathsVertex [9] which extends EdgeListVertex<NodeWritable,
> IntWritable, NodeWritable, IntWritable>  and I am running it locally using these
> arguments: -Dgiraph.maxWorkers=1 -Dgiraph.SplitMasterWorker=false
> -DoverwriteOutput=true src/test/resources/data3.ttl target/foaf
> http://example.org/alice 1
>
> TurtleVertexReader, TurtleVertexInputFormat and FoafShortestPathsVertex are
> still work in progress and I am sure there are plenty of stupid errors.
> However, I do not understand why when I run FoafShortestPathsVertex with the
> DEBUG level, I see debug statements from FoafShortestPathsVertex:
> 19:34:44 DEBUG FoafShortestPathsVertex   :: main({-Dgiraph.maxWorkers=1,
> -Dgiraph.SplitMasterWorker=false, -DoverwriteOutput=true,
> src/test/resources/data3.ttl, target/foaf, http://example.org/alice, 1})
> 19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->  null
> 19:34:44 DEBUG FoafShortestPathsVertex   :: setConf(Configuration:
> core-default.xml, core-site.xml)
> 19:34:44 DEBUG FoafShortestPathsVertex   :: run({src/test/resources/data3.ttl,
> target/foaf, http://example.org/alice, 1})
> 19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->  Configuration:
> core-default.xml, core-site.xml
> 19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->  Configuration:
> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
> giraph-site.xml
>
> But, I do not see anything else, no log statement from TurtleVertexReader or
> TurtleVertexInputFormat. Why? What am I doing wrong?
> Is it because I am running it locally?
>
> Thanks,
> Paolo
>
>   [1] http://en.wikipedia.org/wiki/FOAF_%28software%29
>   [2] https://issues.apache.org/jira/browse/GIRAPH-170
>   [3] https://issues.apache.org/jira/browse/GIRAPH-141
>   [4] http://en.wikipedia.org/wiki/Turtle_%28syntax%29
>   [5] http://en.wikipedia.org/wiki/N-Triples
>   [6]
> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/mapreduce/Rdf2AdjacencyListDriver.java
>   [7]
> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/giraph/TurtleVertexReader.java
>   [8]
> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/giraph/TurtleVertexInputFormat.java
>   [9]
> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/giraph/FoafShortestPathsVertex.java


Mime
View raw message