giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: GIRAPH-825 and GIRAPH-840
Date Sat, 15 Feb 2014 20:17:25 GMT
I ran the job without the caching trick in the inputformat and still run 
into the freeze.


On 02/15/2014 08:52 PM, Claudio Martella wrote:
> I don't know, maybe I'm missing something, or there's a bug there as well.
> I do agree that this is spooky. Armando has tested it also with the
> WattsStrogatzInputformat, that creates another type of graph. For what I
> understand, this should not happen due to the topology. I think we should
> just try to replicate this behavior, hopefully without a very large graph
> that makes debugging difficult.
>
>
> On Sat, Feb 15, 2014 at 8:42 PM, Sebastian Schelter <ssc@apache.org> wrote:
>
>> I copied the caching from o.a.g.io.formats.IntIntNullTextInputFormat and
>> it worked well during my tests (it did not happen that all vertices had the
>> same id).
>>
>> I'm happy to remove this and rerun the tests. It's strange that
>> out-of-core works with PageRank on a generated graph, but not with
>> Hyperball on the twitter graph. The generated graph has a uniform degree
>> distribution, while the twitter graph's degree distribution is heavily
>> skewed, can that have an influence on the behavior of ooc?
>>
>> Best,
>> Sebastian
>>
>>
>>
>> On 02/15/2014 08:32 PM, Claudio Martella wrote:
>>
>>> Sebastian, I had a look at your vertexinputformat. I think there might be
>>> a
>>> bug. Why are you caching/reusing the id? This way every vertex parsed by
>>> the vertexreader will share the same ID object, and hence have the same
>>> ID.
>>> I think this is broken. you should instantiate a new ID object in the
>>> preprocessLine.
>>> Can you try like that?
>>>
>>>
>>> On Thu, Feb 13, 2014 at 9:50 PM, Sebastian Schelter <ssc@apache.org>
>>> wrote:
>>>
>>>   Hi Armando,
>>>>
>>>> I uploaded my test code to github at:
>>>>
>>>> https://github.com/sscdotopen/giraph/tree/hyperball64-ooc
>>>>
>>>> I'm working on an algorithm to estimate the neighborhood function of the
>>>> graph (similar to [1]). I'm running this on the transposed adjacency
>>>> matrix
>>>> of a snapshot of the twitter follower graph [2]. For this graph
>>>> out-of-core
>>>> is not necessary, but I would like to run my algorithm on another larger
>>>> graph that doesn't fit into the aggregated main memory of the cluster
>>>> anymore.
>>>>
>>>> I think for testing purposes, you can run it on any large graph in
>>>> adjacency form.
>>>>
>>>> Our cluster consists of 25 machines with 32GB ram, 8 cores and 4 disks
>>>> per
>>>> machine. I use the following options to run the algorithm:
>>>>
>>>> hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-
>>>> with-dependencies.jar
>>>> org.apache.giraph.GiraphRunner
>>>>
>>>> org.apache.giraph.examples.hyperball.HyperBall
>>>>
>>>> --vertexInputFormat org.apache.giraph.examples.hyperball.
>>>> HyperBallTextInputFormat
>>>>
>>>> --vertexInputPath hdfs:///ssc/twitter-negative/
>>>>
>>>> --vertexOutputFormat org.apache.giraph.io.formats.
>>>> IdWithValueTextOutputFormat
>>>>
>>>> --outputPath hdfs:///ssc/tmp-123/
>>>>
>>>> --combiner org.apache.giraph.comm.messages.HyperLogLogCombiner
>>>>
>>>> --outEdges org.apache.giraph.edge.LongNullArrayEdges
>>>>
>>>> --workers 24
>>>>
>>>> --customArguments
>>>>
>>>> giraph.oneToAllMsgSending=true,
>>>> giraph.isStaticGraph=true,
>>>> giraph.numComputeThreads=15,
>>>> giraph.numInputThreads=15,
>>>> giraph.numOutputThreads=15,
>>>> giraph.maxNumberOfSupersteps=30,
>>>> giraph.useOutOfCoreGraph=true,
>>>> giraph.maxPartitionsInMemory=20
>>>>
>>>> Best,
>>>> Sebastian
>>>>
>>>> [1] http://arxiv.org/abs/1308.2144
>>>> [2] http://konect.uni-koblenz.de/networks/twitter_mpi
>>>>
>>>>
>>>> On 02/12/2014 04:21 PM, Armando Miraglia wrote:
>>>>
>>>>
>>>>> Hi Sebastian,
>>>>>
>>>>> On Wed, Feb 12, 2014 at 02:59:20PM +0100, Sebastian Schelter wrote:
>>>>>
>>>>>   No. Should I have done that?
>>>>>>
>>>>>>
>>>>> could you please provide me with the test you have done together with
>>>>> the variables that you have set during for the computation? This would
>>>>> help me a lot.
>>>>>
>>>>> Cheers,
>>>>> Armando
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>


Mime
View raw message