giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claudio Martella <claudio.marte...@gmail.com>
Subject Re: GIRAPH-825 and GIRAPH-840
Date Sat, 15 Feb 2014 19:32:12 GMT
Sebastian, I had a look at your vertexinputformat. I think there might be a
bug. Why are you caching/reusing the id? This way every vertex parsed by
the vertexreader will share the same ID object, and hence have the same ID.
I think this is broken. you should instantiate a new ID object in the
preprocessLine.
Can you try like that?


On Thu, Feb 13, 2014 at 9:50 PM, Sebastian Schelter <ssc@apache.org> wrote:

> Hi Armando,
>
> I uploaded my test code to github at:
>
> https://github.com/sscdotopen/giraph/tree/hyperball64-ooc
>
> I'm working on an algorithm to estimate the neighborhood function of the
> graph (similar to [1]). I'm running this on the transposed adjacency matrix
> of a snapshot of the twitter follower graph [2]. For this graph out-of-core
> is not necessary, but I would like to run my algorithm on another larger
> graph that doesn't fit into the aggregated main memory of the cluster
> anymore.
>
> I think for testing purposes, you can run it on any large graph in
> adjacency form.
>
> Our cluster consists of 25 machines with 32GB ram, 8 cores and 4 disks per
> machine. I use the following options to run the algorithm:
>
> hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
> org.apache.giraph.GiraphRunner
>
> org.apache.giraph.examples.hyperball.HyperBall
>
> --vertexInputFormat org.apache.giraph.examples.hyperball.
> HyperBallTextInputFormat
>
> --vertexInputPath hdfs:///ssc/twitter-negative/
>
> --vertexOutputFormat org.apache.giraph.io.formats.
> IdWithValueTextOutputFormat
>
> --outputPath hdfs:///ssc/tmp-123/
>
> --combiner org.apache.giraph.comm.messages.HyperLogLogCombiner
>
> --outEdges org.apache.giraph.edge.LongNullArrayEdges
>
> --workers 24
>
> --customArguments
>
> giraph.oneToAllMsgSending=true,
> giraph.isStaticGraph=true,
> giraph.numComputeThreads=15,
> giraph.numInputThreads=15,
> giraph.numOutputThreads=15,
> giraph.maxNumberOfSupersteps=30,
> giraph.useOutOfCoreGraph=true,
> giraph.maxPartitionsInMemory=20
>
> Best,
> Sebastian
>
> [1] http://arxiv.org/abs/1308.2144
> [2] http://konect.uni-koblenz.de/networks/twitter_mpi
>
>
> On 02/12/2014 04:21 PM, Armando Miraglia wrote:
>
>>
>> Hi Sebastian,
>>
>> On Wed, Feb 12, 2014 at 02:59:20PM +0100, Sebastian Schelter wrote:
>>
>>> No. Should I have done that?
>>>
>>
>> could you please provide me with the test you have done together with
>> the variables that you have set during for the computation? This would
>> help me a lot.
>>
>> Cheers,
>> Armando
>>
>>
>


-- 
   Claudio Martella

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message