giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Bishop <jbishop....@gmail.com>
Subject Re: What if the resulting graph is larger than the memory?
Date Tue, 21 May 2013 16:56:52 GMT
Sounds like you could make use of HBase to store your results. The vertices
could simply be row keys into HBase...


On Tue, May 21, 2013 at 2:52 AM, Han JU <ju.han.felix@gmail.com> wrote:

> Hi Maja,
>
> The input graph of my problem is not big, the calculation result is very
> big.
> In fact what does out-of-core graph mean? Where can I find some examples
> of this and for output during computation?
>
> Thanks.
>
>
>
> 2013/5/17 Maja Kabiljo <majakabiljo@fb.com>
>
>>  Hi JU,
>>
>>  One thing you can try is to use out-of-core graph
>> (giraph.useOutOfCoreGraph option).
>>
>>  I don't know what your exact use case is – do you have the graph which
>> is huge or the data which you calculate in your application is? In the
>> second case, there is 'giraph.doOutputDuringComputation' option you might
>> want to try out. When that is turned on, during each superstep writeVertex
>> will be called immediately after compute for that vertex is called. This
>> means that you can store data you want to write in vertex, write it and
>> clear the data before going to the next vertex.
>>
>>  Maja
>>
>>   From: Han JU <ju.han.felix@gmail.com>
>> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org>
>> Date: Friday, May 17, 2013 8:38 AM
>> To: "user@giraph.apache.org" <user@giraph.apache.org>
>> Subject: What if the resulting graph is larger than the memory?
>>
>>   Hi,
>>
>>  It's me again.
>> After a day's work I've coded a Giraph solution for my problem at hand. I
>> gave it a run on a medium dataset and it's notably faster than other
>> approaches.
>>
>>  However the goal is to process larger inputs, for example I've a larger
>> dataset that the result graph is about 400GB when represented in edge
>> format and in text file. And I think the edges that the algorithm created
>> all reside in the cluster's memory. So it means that for this big dataset,
>> I need a cluster with ~ 400GB main memory to run? Is there any
>> possibilities that I can output "on the go" that means I don't need to
>> construct the whole graph, an edge is outputed to HDFS immediately instead
>> of being created in main memory then be outputed?
>>
>>  Thanks!
>> --
>> *JU Han*
>>
>>    Software Engineer Intern @ KXEN Inc.
>>   UTC   -  Université de Technologie de Compiègne
>>    *     **GI06 - Fouille de Données et Décisionnel*
>>
>>  +33 0619608888
>>
>
>
>
> --
> *JU Han*
>
> Software Engineer Intern @ KXEN Inc.
> UTC   -  Université de Technologie de Compiègne
> *     **GI06 - Fouille de Données et Décisionnel*
>
> +33 0619608888
>

Mime
View raw message