incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
Date Wed, 16 Nov 2011 22:37:53 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151587#comment-13151587
] 

jiraposter@reviews.apache.org commented on GIRAPH-91:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2868/
-----------------------------------------------------------

Review request for giraph.


Summary
-------

There general changes should support larger heap sizes (i.e. >20G)

- Added new EdgeListVertex that stores its edges in a compact pair of lists instead of Vertex's
HashMap.

- Added unittests TestEdgeArrayVertex to test EdgeListVertex.

- Augmented PageRankBenchmark to choose between EdgeListArrayVertex or Vertex (to try it out).

- Added failure cleanup for failed workers to quickly alert the master that they are dead
by deleting its health ephemeral znode.  This allows us to set higher ZooKeeper timeouts to
deal with GC pauses and the like.  In a quick test of 3 nodes, I saw failure in 43 seconds
instead of 1m 52 sec.

- Added a context.progress() to flushing to not kill jobs with long timeouts (GC or lots of
messages).


This addresses bug GIRAPH-91.
    https://issues.apache.org/jira/browse/GIRAPH-91


Diffs
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
1202898 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
1202898 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
1202898 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
1202898 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/EdgeListVertex.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
1202898 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
1202898 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
1202898 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/graph/TestEdgeListVertex.java
PRE-CREATION 

Diff: https://reviews.apache.org/r/2868/diff


Testing
-------

Local unittests, PageRankBenchmark on multiple machines with >20GB heaps.


Thanks,

Avery


                
> Large-memory improvements (Memory reduced vertex implementation, fast failure, added
settings) 
> -----------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-91
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-91
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>
> Current vertex implementation uses a HashMap for storing the edges, which is quite memory
heavy for large graphs.  The default settings in Giraph need to be improved for large graphs
and heaps of >20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message