giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-249) Move part of the graph out-of-core when memory is low
Date Mon, 16 Jul 2012 18:00:46 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415456#comment-13415456
] 

Eli Reisman commented on GIRAPH-249:
------------------------------------

Died again on slightly larger job but one that is still well within the config/data profile
for trunk + 256, 246, 250. Died on INPUT_SUPERSTEP, same errors as before.

I am really impressed by and intrigued by the code in this patch though, seems like some form
of this code could really be an important life-preserver for a worker about to bring down
a whole job with it. Get this thing to target INPUT_SUPERSTEP and make larger graph.splitmb
reads possible to capitalize on HDFS block sizes and I will be the biggest fan of this one!
The metrics on these jobs I ran with the patch still indicate INPUT_SUPERSTEP is chief culprit
for memory problems during a run.


                
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
>                 Key: GIRAPH-249
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-249
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch,
GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping the whole
graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of memory, while
gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate issue, although
the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job (albeit slowly)
instead of failing when the graph is too big, while still encouraging memory optimizations
and high-memory clusters; or restructuring Giraph to be as efficient as possible in disk mode,
making it almost a standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message