incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Date Wed, 11 Apr 2012 17:33:50 GMT
It shouldn't be, your code looks very similar to the unittests (i.e. 
TestManualCheckpoint.java).  So, you're trying to run your test with the 
local hadoop (similar to the unittests)?  Or are you using an actual 
hadoop setup?

Avery

On 4/10/12 11:41 PM, Paolo Castagna wrote:
> I am using hadoop-core-1.0.1.jar ... could that be a problem?
>
> Paolo
>
> Paolo Castagna wrote:
>> Hi Avery,
>> nope, no luck.
>>
>> I have changed all my log.debug(...) into log.info(...). Same behavior.
>>
>> I have a log4j.properties [1] file in my classpath and it has:
>> log4j.logger.org.apache.jena.grande=DEBUG
>> log4j.logger.org.apache.jena.grande.giraph=DEBUG
>> I also tried to change that to:
>> log4j.logger.org.apache.jena.grande=INFO
>> log4j.logger.org.apache.jena.grande.giraph=INFO
>> No luck.
>>
>> My Giraph job has:
>> GiraphJob job = new GiraphJob(getConf(), getClass().getName());
>> job.setVertexClass(getClass());
>> job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
>> job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);
>>
>> But, if I run in debug with a breakpoint in the TurtleVertexInputFormat.class
>> constructor, it is never instanciated. How can it be?
>>
>> So perhaps the problem is not the logging, it is the fact that
>> my GiraphJob is not using TurtleVertexInputFormat.class and
>> TurtleVertexOutputFormat.class, but I don't see what I am doing
>> wrong. :-/
>>
>> Thanks,
>> Paolo
>>
>>   [1]
>> https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties
>>
>> Avery Ching wrote:
>>> I think the issue might be that Hadoop only logs INFO and above messages
>>> by default.  Can you retry with INFO level logging?
>>>
>>> Avery
>>>
>>> On 4/10/12 12:17 PM, Paolo Castagna wrote:
>>>> Hi,
>>>> I am still learning Giraph, so, please, be patient with me and forgive my
>>>> trivial questions.
>>>>
>>>> As a simple initial use case, I want to compute the shortest paths
>>>> from a single
>>>> source in a social graph in RDF format using the FOAF [1] vocabulary.
>>>> This example also will hopefully inform GIRAPH-170 [2] and related
>>>> issues, such
>>>> as: GIRAPH-141 [3].
>>>>
>>>> Here is an example in Turtle [4] format of a tiny graph using FOAF:
>>>> ----
>>>> @prefix :<http://example.org/>   .
>>>> @prefix foaf:<http://xmlns.com/foaf/0.1/>   .
>>>>
>>>> :alice
>>>>       a           foaf:Person ;
>>>>       foaf:name   "Alice" ;
>>>>       foaf:mbox<mailto:alice@example.org>   ;
>>>>       foaf:knows  :bob ;
>>>>       foaf:knows  :charlie ;
>>>>       foaf:knows  :snoopy ;
>>>>       .
>>>>
>>>> :bob
>>>>       foaf:name   "Bob" ;
>>>>       foaf:knows  :charlie ;
>>>>       .
>>>>
>>>> :charlie
>>>>       foaf:name   "Charlie" ;
>>>>       foaf:knows  :alice ;
>>>>       .
>>>> ----
>>>> This is nice, human friendly (RDF without angle brackets!), but not
>>>> easily
>>>> splittable to be processed with MapReduce (or Giraph).
>>>>
>>>> Here is the same graph in N-Triples [5] format:
>>>> ----
>>>> <http://example.org/alice>
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://xmlns.com/foaf/0.1/Person>   .
>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/name>
  "Alice" .
>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/mbox>
>>>> <mailto:alice@example.org>   .
>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/knows>
>>>> <http://example.org/bob>   .
>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/knows>
>>>> <http://example.org/charlie>   .
>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/knows>
>>>> <http://example.org/snoopy>   .
>>>> <http://example.org/charlie>   <http://xmlns.com/foaf/0.1/name>
>>>> "Charlie" .
>>>> <http://example.org/charlie>   <http://xmlns.com/foaf/0.1/knows>
>>>> <http://example.org/alice>   .
>>>> <http://example.org/bob>   <http://xmlns.com/foaf/0.1/name> 
 "Bob" .
>>>> <http://example.org/bob>   <http://xmlns.com/foaf/0.1/knows>
>>>> <http://example.org/charlie>   .
>>>> ----
>>>> This is more verbose and ugly, but splittable.
>>>>
>>>> The graph I am interested in is the graph represented by foaf:knows
>>>> relationships/links between people (please, note --knows-->
>>>> relationship here
>>>> has a direction, this isn't symmetric as in centralized social networking
>>>> websites such as Facebook or LinkedIn. Alice can claim to know Bob,
>>>> without Bob
>>>> knowing it and/or it might even be a false claim):
>>>>
>>>> alice --knows-->   bob
>>>> alice --knows-->   charlie
>>>> alice --knows-->   snoopy
>>>> bob --knows-->   charlie
>>>> charlie --knows-->   alice
>>>>
>>>> As a first step, I wrote a MapReduce job [6] to transform the RDF
>>>> graph above in
>>>> a sort of adjacency list using Turtle syntax, here is the output
>>>> (three lines):
>>>> ----
>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/mbox>
>>>> <mailto:alice@example.org>;<http://xmlns.com/foaf/0.1/name> 
 "Alice";
>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <http://xmlns.com/foaf/0.1/Person>;<http://xmlns.com/foaf/0.1/knows>
>>>> <http://example.org/charlie>,<http://example.org/bob>,
>>>> <http://example.org/snoopy>; .<http://example.org/charlie>
>>>> <http://xmlns.com/foaf/0.1/knows>   <http://example.org/alice>.
>>>>
>>>> <http://example.org/bob>   <http://xmlns.com/foaf/0.1/name> 
 "Bob";
>>>> <http://xmlns.com/foaf/0.1/knows>   <http://example.org/charlie>;
.
>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/knows>
>>>> <http://example.org/bob>.
>>>>
>>>> <http://example.org/charlie>   <http://xmlns.com/foaf/0.1/name>
>>>> "Charlie";
>>>> <http://xmlns.com/foaf/0.1/knows>   <http://example.org/alice>;
.
>>>> <http://example.org/bob>   <http://xmlns.com/foaf/0.1/knows>
>>>> <http://example.org/charlie>.<http://example.org/alice>
>>>> <http://xmlns.com/foaf/0.1/knows>   <http://example.org/charlie>.
>>>> ----
>>>> This is legal Turtle, but it is also splittable. Each line has all the
>>>> RDF
>>>> statements (i.e. egdes) for a person (there are also incoming edges).
>>>>
>>>> I wrote a TurtleVertexReader [7] which extends
>>>> TextVertexReader<NodeWritable,
>>>> Text, NodeWritable, Text>   and a TurtleVertexInputFormat [8] which
>>>> extends
>>>> TextVertexInputFormat<NodeWritable, Text, NodeWritable, Text>.
>>>> I wrote (copying from the example SimpleShortestPathsVertex) a
>>>> FoafShortestPathsVertex [9] which extends EdgeListVertex<NodeWritable,
>>>> IntWritable, NodeWritable, IntWritable>   and I am running it locally
>>>> using these
>>>> arguments: -Dgiraph.maxWorkers=1 -Dgiraph.SplitMasterWorker=false
>>>> -DoverwriteOutput=true src/test/resources/data3.ttl target/foaf
>>>> http://example.org/alice 1
>>>>
>>>> TurtleVertexReader, TurtleVertexInputFormat and
>>>> FoafShortestPathsVertex are
>>>> still work in progress and I am sure there are plenty of stupid errors.
>>>> However, I do not understand why when I run FoafShortestPathsVertex
>>>> with the
>>>> DEBUG level, I see debug statements from FoafShortestPathsVertex:
>>>> 19:34:44 DEBUG FoafShortestPathsVertex   :: main({-Dgiraph.maxWorkers=1,
>>>> -Dgiraph.SplitMasterWorker=false, -DoverwriteOutput=true,
>>>> src/test/resources/data3.ttl, target/foaf, http://example.org/alice, 1})
>>>> 19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->   null
>>>> 19:34:44 DEBUG FoafShortestPathsVertex   :: setConf(Configuration:
>>>> core-default.xml, core-site.xml)
>>>> 19:34:44 DEBUG FoafShortestPathsVertex   ::
>>>> run({src/test/resources/data3.ttl,
>>>> target/foaf, http://example.org/alice, 1})
>>>> 19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->   Configuration:
>>>> core-default.xml, core-site.xml
>>>> 19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->   Configuration:
>>>> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
>>>> giraph-site.xml
>>>>
>>>> But, I do not see anything else, no log statement from
>>>> TurtleVertexReader or
>>>> TurtleVertexInputFormat. Why? What am I doing wrong?
>>>> Is it because I am running it locally?
>>>>
>>>> Thanks,
>>>> Paolo
>>>>
>>>>    [1] http://en.wikipedia.org/wiki/FOAF_%28software%29
>>>>    [2] https://issues.apache.org/jira/browse/GIRAPH-170
>>>>    [3] https://issues.apache.org/jira/browse/GIRAPH-141
>>>>    [4] http://en.wikipedia.org/wiki/Turtle_%28syntax%29
>>>>    [5] http://en.wikipedia.org/wiki/N-Triples
>>>>    [6]
>>>> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/mapreduce/Rdf2AdjacencyListDriver.java
>>>>
>>>>    [7]
>>>> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/giraph/TurtleVertexReader.java
>>>>
>>>>    [8]
>>>> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/giraph/TurtleVertexInputFormat.java
>>>>
>>>>    [9]
>>>> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/giraph/FoafShortestPathsVertex.java
>>>>


Mime
View raw message