giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Claudio Martella (JIRA)" <>
Subject [jira] [Commented] (GIRAPH-494) Edge should be an interface
Date Fri, 01 Feb 2013 06:59:12 GMT


Claudio Martella commented on GIRAPH-494:

Quite frankly the memory impact of this patch is measurable without benchmarks. It is one
reference per edge, there's no magic involved. The comparison between giraph and other systems
show that we eat and waste so much memory. I recently ran PageRankBenchmark on 64 workers
with 7GB heap each for a 65M vertices graph and 100 edges each, and it went OOM. This is quite
incredible. Other systems (Signal/Collect) run PR on less machines/memory within 60 seconds
on that graph.

Memory consumption should be at the top of our priority. Plus, I strongly believe that most
of the algorithms out there live happily without a value, and we should not penalize them.

I agree with you that the API is not there yet, it is not coherent, and there is no bigger
picture. But we are not out there with 0.2 yet, and this is the moment to break the API. This
does not mean that we should keep on breaking it regardless, of course.
> Edge should be an interface
> ---------------------------
>                 Key: GIRAPH-494
>                 URL:
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Nitay Joffe
>            Assignee: Nitay Joffe
>         Attachments: GIRAPH-494.patch
> In terms of architecture and for flexibility I think our Edge class should be an interface
instead of a real class. In this diff I change it to an interface and add a sub interface
called MutableEdge. The existing Edge class is now called DefaultEdge. Note that only one
class in our codebase actually needs a MutableEdge - RepresentativeVertex. Everything else
works perfectly fine using the immutable Edge interface.
> One nice thing this allowed me to do is to create a EdgeNoValue which we can use for
algorithms whose edges have no value at all. Currently the same functionality is achieved
by using NullWritable, however using EdgeNoValue means not storing a reference to the single
NullWritable instance in every single edge. Working on a job that reads 1B+ edges per worker,
a pointer per edge adds up.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message