giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maja Kabiljo <>
Subject Re: What if the resulting graph is larger than the memory?
Date Fri, 17 May 2013 17:00:01 GMT
Hi JU,

One thing you can try is to use out-of-core graph (giraph.useOutOfCoreGraph option).

I don't know what your exact use case is – do you have the graph which is huge or the data
which you calculate in your application is? In the second case, there is 'giraph.doOutputDuringComputation'
option you might want to try out. When that is turned on, during each superstep writeVertex
will be called immediately after compute for that vertex is called. This means that you can
store data you want to write in vertex, write it and clear the data before going to the next


From: Han JU <<>>
Reply-To: "<>" <<>>
Date: Friday, May 17, 2013 8:38 AM
To: "<>" <<>>
Subject: What if the resulting graph is larger than the memory?


It's me again.
After a day's work I've coded a Giraph solution for my problem at hand. I gave it a run on
a medium dataset and it's notably faster than other approaches.

However the goal is to process larger inputs, for example I've a larger dataset that the result
graph is about 400GB when represented in edge format and in text file. And I think the edges
that the algorithm created all reside in the cluster's memory. So it means that for this big
dataset, I need a cluster with ~ 400GB main memory to run? Is there any possibilities that
I can output "on the go" that means I don't need to construct the whole graph, an edge is
outputed to HDFS immediately instead of being created in main memory then be outputed?

JU Han

Software Engineer Intern @ KXEN Inc.
UTC   -  Université de Technologie de Compiègne
     GI06 - Fouille de Données et Décisionnel

+33 0619608888

View raw message