incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Castagna <castagna.li...@googlemail.com>
Subject Re: A simple use case: shortest paths on a FOAF (i.e. Friend of a Friend) graph
Date Wed, 11 Apr 2012 19:09:02 GMT
Avery Ching wrote:
> It shouldn't be, your code looks very similar to the unittests (i.e.
> TestManualCheckpoint.java).  So, you're trying to run your test with the
> local hadoop (similar to the unittests)?  Or are you using an actual
> hadoop setup?

Hi Avery,
while I am learning and writing the first examples, I am trying to run with
a local hadoop (similar to the unit tests). This way, I can easily run and
debug the code from the IDE.

Tomorrow, I'll look at the unit tests again trying to see if I can spot what
I am doing wrong.

Thanks,
Paolo

> 
> Avery
> 
> On 4/10/12 11:41 PM, Paolo Castagna wrote:
>> I am using hadoop-core-1.0.1.jar ... could that be a problem?
>>
>> Paolo
>>
>> Paolo Castagna wrote:
>>> Hi Avery,
>>> nope, no luck.
>>>
>>> I have changed all my log.debug(...) into log.info(...). Same behavior.
>>>
>>> I have a log4j.properties [1] file in my classpath and it has:
>>> log4j.logger.org.apache.jena.grande=DEBUG
>>> log4j.logger.org.apache.jena.grande.giraph=DEBUG
>>> I also tried to change that to:
>>> log4j.logger.org.apache.jena.grande=INFO
>>> log4j.logger.org.apache.jena.grande.giraph=INFO
>>> No luck.
>>>
>>> My Giraph job has:
>>> GiraphJob job = new GiraphJob(getConf(), getClass().getName());
>>> job.setVertexClass(getClass());
>>> job.setVertexInputFormatClass(TurtleVertexInputFormat.class);
>>> job.setVertexOutputFormatClass(TurtleVertexOutputFormat.class);
>>>
>>> But, if I run in debug with a breakpoint in the
>>> TurtleVertexInputFormat.class
>>> constructor, it is never instanciated. How can it be?
>>>
>>> So perhaps the problem is not the logging, it is the fact that
>>> my GiraphJob is not using TurtleVertexInputFormat.class and
>>> TurtleVertexOutputFormat.class, but I don't see what I am doing
>>> wrong. :-/
>>>
>>> Thanks,
>>> Paolo
>>>
>>>   [1]
>>> https://github.com/castagna/jena-grande/blob/master/src/test/resources/log4j.properties
>>>
>>>
>>> Avery Ching wrote:
>>>> I think the issue might be that Hadoop only logs INFO and above
>>>> messages
>>>> by default.  Can you retry with INFO level logging?
>>>>
>>>> Avery
>>>>
>>>> On 4/10/12 12:17 PM, Paolo Castagna wrote:
>>>>> Hi,
>>>>> I am still learning Giraph, so, please, be patient with me and
>>>>> forgive my
>>>>> trivial questions.
>>>>>
>>>>> As a simple initial use case, I want to compute the shortest paths
>>>>> from a single
>>>>> source in a social graph in RDF format using the FOAF [1] vocabulary.
>>>>> This example also will hopefully inform GIRAPH-170 [2] and related
>>>>> issues, such
>>>>> as: GIRAPH-141 [3].
>>>>>
>>>>> Here is an example in Turtle [4] format of a tiny graph using FOAF:
>>>>> ----
>>>>> @prefix :<http://example.org/>   .
>>>>> @prefix foaf:<http://xmlns.com/foaf/0.1/>   .
>>>>>
>>>>> :alice
>>>>>       a           foaf:Person ;
>>>>>       foaf:name   "Alice" ;
>>>>>       foaf:mbox<mailto:alice@example.org>   ;
>>>>>       foaf:knows  :bob ;
>>>>>       foaf:knows  :charlie ;
>>>>>       foaf:knows  :snoopy ;
>>>>>       .
>>>>>
>>>>> :bob
>>>>>       foaf:name   "Bob" ;
>>>>>       foaf:knows  :charlie ;
>>>>>       .
>>>>>
>>>>> :charlie
>>>>>       foaf:name   "Charlie" ;
>>>>>       foaf:knows  :alice ;
>>>>>       .
>>>>> ----
>>>>> This is nice, human friendly (RDF without angle brackets!), but not
>>>>> easily
>>>>> splittable to be processed with MapReduce (or Giraph).
>>>>>
>>>>> Here is the same graph in N-Triples [5] format:
>>>>> ----
>>>>> <http://example.org/alice>
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://xmlns.com/foaf/0.1/Person>   .
>>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/name>
 
>>>>> "Alice" .
>>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/mbox>
>>>>> <mailto:alice@example.org>   .
>>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/knows>
>>>>> <http://example.org/bob>   .
>>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/knows>
>>>>> <http://example.org/charlie>   .
>>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/knows>
>>>>> <http://example.org/snoopy>   .
>>>>> <http://example.org/charlie>   <http://xmlns.com/foaf/0.1/name>
>>>>> "Charlie" .
>>>>> <http://example.org/charlie>   <http://xmlns.com/foaf/0.1/knows>
>>>>> <http://example.org/alice>   .
>>>>> <http://example.org/bob>   <http://xmlns.com/foaf/0.1/name>
  "Bob" .
>>>>> <http://example.org/bob>   <http://xmlns.com/foaf/0.1/knows>
>>>>> <http://example.org/charlie>   .
>>>>> ----
>>>>> This is more verbose and ugly, but splittable.
>>>>>
>>>>> The graph I am interested in is the graph represented by foaf:knows
>>>>> relationships/links between people (please, note --knows-->
>>>>> relationship here
>>>>> has a direction, this isn't symmetric as in centralized social
>>>>> networking
>>>>> websites such as Facebook or LinkedIn. Alice can claim to know Bob,
>>>>> without Bob
>>>>> knowing it and/or it might even be a false claim):
>>>>>
>>>>> alice --knows-->   bob
>>>>> alice --knows-->   charlie
>>>>> alice --knows-->   snoopy
>>>>> bob --knows-->   charlie
>>>>> charlie --knows-->   alice
>>>>>
>>>>> As a first step, I wrote a MapReduce job [6] to transform the RDF
>>>>> graph above in
>>>>> a sort of adjacency list using Turtle syntax, here is the output
>>>>> (three lines):
>>>>> ----
>>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/mbox>
>>>>> <mailto:alice@example.org>;<http://xmlns.com/foaf/0.1/name>
  "Alice";
>>>>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <http://xmlns.com/foaf/0.1/Person>;<http://xmlns.com/foaf/0.1/knows>
>>>>> <http://example.org/charlie>,<http://example.org/bob>,
>>>>> <http://example.org/snoopy>; .<http://example.org/charlie>
>>>>> <http://xmlns.com/foaf/0.1/knows>   <http://example.org/alice>.
>>>>>
>>>>> <http://example.org/bob>   <http://xmlns.com/foaf/0.1/name>
  "Bob";
>>>>> <http://xmlns.com/foaf/0.1/knows>   <http://example.org/charlie>;
.
>>>>> <http://example.org/alice>   <http://xmlns.com/foaf/0.1/knows>
>>>>> <http://example.org/bob>.
>>>>>
>>>>> <http://example.org/charlie>   <http://xmlns.com/foaf/0.1/name>
>>>>> "Charlie";
>>>>> <http://xmlns.com/foaf/0.1/knows>   <http://example.org/alice>;
.
>>>>> <http://example.org/bob>   <http://xmlns.com/foaf/0.1/knows>
>>>>> <http://example.org/charlie>.<http://example.org/alice>
>>>>> <http://xmlns.com/foaf/0.1/knows>   <http://example.org/charlie>.
>>>>> ----
>>>>> This is legal Turtle, but it is also splittable. Each line has all the
>>>>> RDF
>>>>> statements (i.e. egdes) for a person (there are also incoming edges).
>>>>>
>>>>> I wrote a TurtleVertexReader [7] which extends
>>>>> TextVertexReader<NodeWritable,
>>>>> Text, NodeWritable, Text>   and a TurtleVertexInputFormat [8] which
>>>>> extends
>>>>> TextVertexInputFormat<NodeWritable, Text, NodeWritable, Text>.
>>>>> I wrote (copying from the example SimpleShortestPathsVertex) a
>>>>> FoafShortestPathsVertex [9] which extends EdgeListVertex<NodeWritable,
>>>>> IntWritable, NodeWritable, IntWritable>   and I am running it locally
>>>>> using these
>>>>> arguments: -Dgiraph.maxWorkers=1 -Dgiraph.SplitMasterWorker=false
>>>>> -DoverwriteOutput=true src/test/resources/data3.ttl target/foaf
>>>>> http://example.org/alice 1
>>>>>
>>>>> TurtleVertexReader, TurtleVertexInputFormat and
>>>>> FoafShortestPathsVertex are
>>>>> still work in progress and I am sure there are plenty of stupid
>>>>> errors.
>>>>> However, I do not understand why when I run FoafShortestPathsVertex
>>>>> with the
>>>>> DEBUG level, I see debug statements from FoafShortestPathsVertex:
>>>>> 19:34:44 DEBUG FoafShortestPathsVertex   ::
>>>>> main({-Dgiraph.maxWorkers=1,
>>>>> -Dgiraph.SplitMasterWorker=false, -DoverwriteOutput=true,
>>>>> src/test/resources/data3.ttl, target/foaf,
>>>>> http://example.org/alice, 1})
>>>>> 19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->   null
>>>>> 19:34:44 DEBUG FoafShortestPathsVertex   :: setConf(Configuration:
>>>>> core-default.xml, core-site.xml)
>>>>> 19:34:44 DEBUG FoafShortestPathsVertex   ::
>>>>> run({src/test/resources/data3.ttl,
>>>>> target/foaf, http://example.org/alice, 1})
>>>>> 19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->  
>>>>> Configuration:
>>>>> core-default.xml, core-site.xml
>>>>> 19:34:44 DEBUG FoafShortestPathsVertex   :: getConf() -->  
>>>>> Configuration:
>>>>> core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml,
>>>>> giraph-site.xml
>>>>>
>>>>> But, I do not see anything else, no log statement from
>>>>> TurtleVertexReader or
>>>>> TurtleVertexInputFormat. Why? What am I doing wrong?
>>>>> Is it because I am running it locally?
>>>>>
>>>>> Thanks,
>>>>> Paolo
>>>>>
>>>>>    [1] http://en.wikipedia.org/wiki/FOAF_%28software%29
>>>>>    [2] https://issues.apache.org/jira/browse/GIRAPH-170
>>>>>    [3] https://issues.apache.org/jira/browse/GIRAPH-141
>>>>>    [4] http://en.wikipedia.org/wiki/Turtle_%28syntax%29
>>>>>    [5] http://en.wikipedia.org/wiki/N-Triples
>>>>>    [6]
>>>>> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/mapreduce/Rdf2AdjacencyListDriver.java
>>>>>
>>>>>
>>>>>    [7]
>>>>> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/giraph/TurtleVertexReader.java
>>>>>
>>>>>
>>>>>    [8]
>>>>> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/giraph/TurtleVertexInputFormat.java
>>>>>
>>>>>
>>>>>    [9]
>>>>> https://github.com/castagna/jena-grande/blob/a650758a56cfe0680320445434e6d6adf2d7e544/src/main/java/org/apache/jena/grande/giraph/FoafShortestPathsVertex.java
>>>>>
>>>>>
> 

Mime
View raw message