I would say 8 gigs to play with is a great amount where you will most d= efinitely be able to get very large interesting graphs to run in-memory, de= pending on how many workers (with 8G each) you have to work with. having 3-= 4 workers per machine is not a bad thing if you are provisioned to do this.= And lots of machines. This is a distributed batch processing framework, so= more is better ;)

as far as vertices with a million edges, sure but it depends on how man= y of them and your compute resources. Again, can't go into much detail = but Giraph has been extensively tested using real-world, large, interesting= , useful graph data. This includes large social graphs that have supernode= s. So if you're supplying that, and you have the gear to run your data,= you've picked the right tool. You can spill to disk, run in memory, or= spread the load and scale to many, many workers (Mapper tasks) hosted on m= any nodes and Giraph will behave well if you have the compute resource to s= cale to fit your volume of data.

On Tue, Sep 11, 2012 at 12:27 AM, Avery = Ching wrote:
Hi Jeyendran, nice to meet you.

On 9/10/12 11:23 PM, Jeyendran Balakrishnan wrote:
I am trying to understand what kind of data Giraph holds in memory per
worker.
My questions in descending order of importance:
1. Does Giraph hold in memory exactly one vertex of data at a time, or does=
it need to hold all the vertexes assigned to that worker?
All vertices assigned to that worker.

2. Can Giraph handle vertexes with, a million edges per vertex?
Depends on how much memory you have. =A0Would recommend making a custom ver= tex implementation that has a very efficient store for better scalability (= i.e. see IntIntNullIntVertex).

=A0 =A0 If not, at what order of magnitude does it break down? - 1000 edges= , 10K
edges, 100K edges?...
=A0 =A0(Of course, I understand that this depends upon the -Xmx value, so l= et's
say we fix a value of -Xmx8g).
3. Are there any limitations on the kind of objects that can be used as
vertices?
=A0 =A0 Specifically, does Giraph assume that vertices are lightweight (eg,=
integer vertex ID + simple Java primitive vertex values + collection of
out-edges),
=A0 =A0 or can Giraph support heavyweight vertices (hold complex nested Jav= a
objects in a vertex)?
Limitations are that the vertex implementation must be Writable, the vertex= index must be WritableComparable, edge type Writable, message type Writabl= e.

4. More generally, what data is stored in memory, and what, if any, is
Messages and vertices can be spilled to disk, but you must enable this.
Would appreciate any light the experts can throw on this.

On this note, I would like to mention that the presentations posted on the<= br> Wiki explain what Giraph can do, and how to use it from =A0a coding
perspective, but there are no explanations of the design approach used, the=
rationale behind the choices, and the software architecture. I feel that ne= w
users can really benefit from a design =A0and architecture document, along = the
lines of Hadoop and =A0Lucene. For folks who are considering whether or not= to
use Giraph, this can be a big help. The only alternative today is to read the source code, the burden of which might in itself be reason for folks no= t
to consider using Giraph.
My 2c =A0:-)

Agreed that documentation is lacking =3D). =A0That being said, the presenta= tions explain most of the design approach and reasons. =A0I would refer to = the Pregel paper for a more detailed look or ask if you have any specific q= uestions.

Thanks a lot,
No problem!
Jeyendran

--20cf303b38eb872f8804c971d621--