giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-249) Move part of the graph out-of-core when memory is low
Date Mon, 16 Jul 2012 19:08:34 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13415530#comment-13415530
] 

Alessandro Presta commented on GIRAPH-249:
------------------------------------------

Eli, thanks a lot for running these benchmarks. This is really, really helpful.

Two notes:
- You can try setting "giraph.minFreeMemoryRatio" to something lower than the default. 0.1
is probably too conservative.
- I would compare against trunk to better evaluate the impact of this patch.

So far I was only able to run a few benchmarks on a single machine, by setting limits on the
memory per MapReduce task.
They seem to confirm what you've been saying: when the job fails, it fails at the input superstep
(GraphMapper#setup()) before it can even make use of the WorkerPartitionMap.

I agree, the next step is making this work from the input superstep.
                
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
>                 Key: GIRAPH-249
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-249
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch,
GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping the whole
graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of memory, while
gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate issue, although
the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job (albeit slowly)
instead of failing when the graph is too big, while still encouraging memory optimizations
and high-memory clusters; or restructuring Giraph to be as efficient as possible in disk mode,
making it almost a standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message