giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: clustering coefficient (counting triangles) in giraph.
Date Wed, 07 May 2014 09:55:56 GMT
Hi,

You may want to have a look at MR algorithms such as this one (which should
be easy to adapt):
http://theory.stanford.edu/~sergei/papers/www11-triangles.pdf

Cheers,

--
Gianmarco


On 6 May 2014 15:50, Claudio Martella <claudio.martella@gmail.com> wrote:

> It would indeed be interesting to evaluate. The naive CC algorithm does
> have a huge impact on memory, and i have the feeling that it's often a
> choice of feasibility more than performance, meaning that often the naive
> approach might not be a solution at all due to the huge memory footprint.
> Given that Pregel/Giraph is network-bound, my bet is that one more
> iteration should be cheaper than lots of large messages.
>
>
> On Tue, May 6, 2014 at 3:05 PM, Dionysis Logothetis <dlogothetis@gmail.com
> > wrote:
>
>> Hi guys,
>>
>> One of the ways we've thought about improving the clustering
>> implementation in the Okapi library (although we haven't implemented it
>> yet) is the following. In the naive implementation EVERY vertex sends its
>> friend list to ALL its neighbours. But, because in a triangle it's enough
>> for just 1 of the vertices to detect the triangle, you can have a vertex
>> send its friend list only to neighbors that have lower id. This way, the
>> node with the lower id in the triangle can detect the triangle and simply
>> notify its neighbors that they belong to one (every vertex needs to know).
>>
>> Now, I think this works only for undirected graphs and it adds an extra
>> iteration with respect to the naive implementation, but can probably reduce
>> the amount of traffic and memory used significantly.
>>
>> By taking a quick look at the algorithm by Ediger et al that Arun
>> implemented, it uses a similar approach, that is, selectively propagating
>> vertex ids. That one introduces more iterations, but probably reduces the
>> amount of traffic even more which can be very beneficial. It could be
>> interesting to compare these approaches.
>>
>>
>>
>>
>>
>> On Tue, May 6, 2014 at 12:47 PM, Lukas Nalezenec <
>> lukas.nalezenec@firma.seznam.cz> wrote:
>>
>>>  Hi,
>>>
>>> It has been some time and I don't remember all my comments.  I think you can
replace LongArrayListWritables with class that will sort numbers inside, make a diff compression
and write variable length numbers.
>>> This might cut memory usage by 80~90% (depending on graph properties).
>>> You could also reduce a number of memory allocations and use primitive collections.
>>>
>>> Regards
>>> Lukas
>>>
>>>
>>>
>>> On 6.5.2014 09:08, Arun Kumar wrote:
>>>
>>> Hi
>>>
>>>  For Okapi ML library it is mentioned that some tuning should be done
>>> .Can you clarify that what modification we have to do?
>>>
>>>  Regards
>>> Arun
>>>
>>>
>>> On Tue, Mar 18, 2014 at 2:30 PM, Lukas Nalezenec <
>>> lukas.nalezenec@firma.seznam.cz> wrote:
>>>
>>>>  Hi,
>>>> Check Okapi ML library from Grafos:
>>>> http://grafos.ml/okapi.html#collaborative-als
>>>> It needs some tuning but it will work.
>>>>
>>>> Regards
>>>> Lukas
>>>>
>>>>
>>>>
>>>> On 17.3.2014 20:17, Suijian Zhou wrote:
>>>>
>>>>  Hi, Experts,
>>>>   Does anybody know if there are examples of implementation in giraph
>>>> for clustering coefficient (counting triangles)? Thanks!
>>>>
>>>>    Best Regards,
>>>>    Suijian
>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
> --
>    Claudio Martella
>
>

Mime
View raw message