giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: GIRAPH-825 and GIRAPH-840
Date Sat, 15 Feb 2014 19:42:33 GMT
I copied the caching from o.a.g.io.formats.IntIntNullTextInputFormat and 
it worked well during my tests (it did not happen that all vertices had 
the same id).

I'm happy to remove this and rerun the tests. It's strange that 
out-of-core works with PageRank on a generated graph, but not with 
Hyperball on the twitter graph. The generated graph has a uniform degree 
distribution, while the twitter graph's degree distribution is heavily 
skewed, can that have an influence on the behavior of ooc?

Best,
Sebastian


On 02/15/2014 08:32 PM, Claudio Martella wrote:
> Sebastian, I had a look at your vertexinputformat. I think there might be a
> bug. Why are you caching/reusing the id? This way every vertex parsed by
> the vertexreader will share the same ID object, and hence have the same ID.
> I think this is broken. you should instantiate a new ID object in the
> preprocessLine.
> Can you try like that?
>
>
> On Thu, Feb 13, 2014 at 9:50 PM, Sebastian Schelter <ssc@apache.org> wrote:
>
>> Hi Armando,
>>
>> I uploaded my test code to github at:
>>
>> https://github.com/sscdotopen/giraph/tree/hyperball64-ooc
>>
>> I'm working on an algorithm to estimate the neighborhood function of the
>> graph (similar to [1]). I'm running this on the transposed adjacency matrix
>> of a snapshot of the twitter follower graph [2]. For this graph out-of-core
>> is not necessary, but I would like to run my algorithm on another larger
>> graph that doesn't fit into the aggregated main memory of the cluster
>> anymore.
>>
>> I think for testing purposes, you can run it on any large graph in
>> adjacency form.
>>
>> Our cluster consists of 25 machines with 32GB ram, 8 cores and 4 disks per
>> machine. I use the following options to run the algorithm:
>>
>> hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>> org.apache.giraph.GiraphRunner
>>
>> org.apache.giraph.examples.hyperball.HyperBall
>>
>> --vertexInputFormat org.apache.giraph.examples.hyperball.
>> HyperBallTextInputFormat
>>
>> --vertexInputPath hdfs:///ssc/twitter-negative/
>>
>> --vertexOutputFormat org.apache.giraph.io.formats.
>> IdWithValueTextOutputFormat
>>
>> --outputPath hdfs:///ssc/tmp-123/
>>
>> --combiner org.apache.giraph.comm.messages.HyperLogLogCombiner
>>
>> --outEdges org.apache.giraph.edge.LongNullArrayEdges
>>
>> --workers 24
>>
>> --customArguments
>>
>> giraph.oneToAllMsgSending=true,
>> giraph.isStaticGraph=true,
>> giraph.numComputeThreads=15,
>> giraph.numInputThreads=15,
>> giraph.numOutputThreads=15,
>> giraph.maxNumberOfSupersteps=30,
>> giraph.useOutOfCoreGraph=true,
>> giraph.maxPartitionsInMemory=20
>>
>> Best,
>> Sebastian
>>
>> [1] http://arxiv.org/abs/1308.2144
>> [2] http://konect.uni-koblenz.de/networks/twitter_mpi
>>
>>
>> On 02/12/2014 04:21 PM, Armando Miraglia wrote:
>>
>>>
>>> Hi Sebastian,
>>>
>>> On Wed, Feb 12, 2014 at 02:59:20PM +0100, Sebastian Schelter wrote:
>>>
>>>> No. Should I have done that?
>>>>
>>>
>>> could you please provide me with the test you have done together with
>>> the variables that you have set during for the computation? This would
>>> help me a lot.
>>>
>>> Cheers,
>>> Armando
>>>
>>>
>>
>
>


Mime
View raw message