giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Can Giraph handle graphs with very large number of edges per vertex?
Date Tue, 11 Sep 2012 07:27:17 GMT
Hi Jeyendran, nice to meet you.

Answers inline.

On 9/10/12 11:23 PM, Jeyendran Balakrishnan wrote:
> I am trying to understand what kind of data Giraph holds in memory per
> worker.
> My questions in descending order of importance:
> 1. Does Giraph hold in memory exactly one vertex of data at a time, or does
> it need to hold all the vertexes assigned to that worker?
All vertices assigned to that worker.

> 2. Can Giraph handle vertexes with, a million edges per vertex?
Depends on how much memory you have.  Would recommend making a custom 
vertex implementation that has a very efficient store for better 
scalability (i.e. see IntIntNullIntVertex).
>     If not, at what order of magnitude does it break down? - 1000 edges, 10K
> edges, 100K edges?...
>    (Of course, I understand that this depends upon the -Xmx value, so let's
> say we fix a value of -Xmx8g).
> 3. Are there any limitations on the kind of objects that can be used as
> vertices?
>     Specifically, does Giraph assume that vertices are lightweight (eg,
> integer vertex ID + simple Java primitive vertex values + collection of
> out-edges),
>     or can Giraph support heavyweight vertices (hold complex nested Java
> objects in a vertex)?
Limitations are that the vertex implementation must be Writable, the 
vertex index must be WritableComparable, edge type Writable, message 
type Writable.

> 4. More generally, what data is stored in memory, and what, if any, is
> offloaded/spilled to disk?
Messages and vertices can be spilled to disk, but you must enable this.
> Would appreciate any light the experts can throw on this.
>
> On this note, I would like to mention that the presentations posted on the
> Wiki explain what Giraph can do, and how to use it from  a coding
> perspective, but there are no explanations of the design approach used, the
> rationale behind the choices, and the software architecture. I feel that new
> users can really benefit from a design  and architecture document, along the
> lines of Hadoop and  Lucene. For folks who are considering whether or not to
> use Giraph, this can be a big help. The only alternative today is to read
> the source code, the burden of which might in itself be reason for folks not
> to consider using Giraph.
> My 2c  :-)

Agreed that documentation is lacking =).  That being said, the 
presentations explain most of the design approach and reasons.  I would 
refer to the Pregel paper for a more detailed look or ask if you have 
any specific questions.
>
> Thanks a lot,
No problem!
> Jeyendran
>
>


Mime
View raw message