giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <ssc.o...@googlemail.com>
Subject Re: What if the resulting graph is larger than the memory?
Date Tue, 21 May 2013 10:15:41 GMT
Ah, I see. I have worked on similar things in recommender systems. Here
the problem is generally that you get a result quadratic to the number
of interactions per item. If you have some topsellers in your data,
those might make up for the large result. It helps very much to throw
out the few most popular items (if your application allows that).

Best,
Sebastian


On 21.05.2013 12:10, Han JU wrote:
> Hi Sebastian,
> 
> It's something like frequent item pairs out of transaction data.
> I need all these pairs with somehow a low support (say 2), so the result
> could be very big.
> 
> 
> 
> 2013/5/21 Sebastian Schelter <ssc.open@googlemail.com>
> 
>> Hello Han,
>>
>> out of curiosity, what do you compute that has such a big result?
>>
>> Best,
>> Sebastian
>>
>> On 21.05.2013 11:52, Han JU wrote:
>>> Hi Maja,
>>>
>>> The input graph of my problem is not big, the calculation result is very
>>> big.
>>> In fact what does out-of-core graph mean? Where can I find some examples
>> of
>>> this and for output during computation?
>>>
>>> Thanks.
>>>
>>>
>>>
>>> 2013/5/17 Maja Kabiljo <majakabiljo@fb.com>
>>>
>>>>  Hi JU,
>>>>
>>>>  One thing you can try is to use out-of-core graph
>>>> (giraph.useOutOfCoreGraph option).
>>>>
>>>>  I don't know what your exact use case is – do you have the graph which
>>>> is huge or the data which you calculate in your application is? In the
>>>> second case, there is 'giraph.doOutputDuringComputation' option you
>> might
>>>> want to try out. When that is turned on, during each superstep
>> writeVertex
>>>> will be called immediately after compute for that vertex is called. This
>>>> means that you can store data you want to write in vertex, write it and
>>>> clear the data before going to the next vertex.
>>>>
>>>>  Maja
>>>>
>>>>   From: Han JU <ju.han.felix@gmail.com>
>>>> Reply-To: "user@giraph.apache.org" <user@giraph.apache.org>
>>>> Date: Friday, May 17, 2013 8:38 AM
>>>> To: "user@giraph.apache.org" <user@giraph.apache.org>
>>>> Subject: What if the resulting graph is larger than the memory?
>>>>
>>>>   Hi,
>>>>
>>>>  It's me again.
>>>> After a day's work I've coded a Giraph solution for my problem at hand.
>> I
>>>> gave it a run on a medium dataset and it's notably faster than other
>>>> approaches.
>>>>
>>>>  However the goal is to process larger inputs, for example I've a larger
>>>> dataset that the result graph is about 400GB when represented in edge
>>>> format and in text file. And I think the edges that the algorithm
>> created
>>>> all reside in the cluster's memory. So it means that for this big
>> dataset,
>>>> I need a cluster with ~ 400GB main memory to run? Is there any
>>>> possibilities that I can output "on the go" that means I don't need to
>>>> construct the whole graph, an edge is outputed to HDFS immediately
>> instead
>>>> of being created in main memory then be outputed?
>>>>
>>>>  Thanks!
>>>> --
>>>> *JU Han*
>>>>
>>>>    Software Engineer Intern @ KXEN Inc.
>>>>   UTC   -  Université de Technologie de Compiègne
>>>>    *     **GI06 - Fouille de Données et Décisionnel*
>>>>
>>>>  +33 0619608888
>>>>
>>>
>>>
>>>
>>
>>
> 
> 


Mime
View raw message