giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-249) Move part of the graph out-of-core when memory is low
Date Wed, 15 Aug 2012 19:23:37 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435437#comment-13435437
] 

Alessandro Presta commented on GIRAPH-249:
------------------------------------------

Thanks Claudio, good observation.
You got me curious so I quickly ran a shortest paths benchmark.

500k vertices, 100 edges/vertex, 10 workers

This is with trunk:

{code}
hadoop jar giraph-trunk.jar org.apache.giraph.benchmark.ShortestPathsBenchmark -Dgiraph.useNetty=true
-v -V 500000 -e 100 -w 10
{code}

{code}
Superstep 3 (milliseconds)	5,394
Superstep 4 (milliseconds)	5,650
Superstep 23 (milliseconds)	1,100
Superstep 20 (milliseconds)	1,108
Superstep 31 (milliseconds)	1,192
Setup (milliseconds)	3,297
Shutdown (milliseconds)	2,309
Superstep 27 (milliseconds)	1,135
Superstep 7 (milliseconds)	4,476
Superstep 9 (milliseconds)	3,400
Superstep 8 (milliseconds)	4,043
Superstep 16 (milliseconds)	1,184
Superstep 14 (milliseconds)	2,251
Superstep 6 (milliseconds)	5,150
Superstep 24 (milliseconds)	1,167
Superstep 18 (milliseconds)	1,176
Superstep 5 (milliseconds)	5,483
Superstep 1 (milliseconds)	1,125
Superstep 21 (milliseconds)	1,192
Total (milliseconds)	85,757
Superstep 15 (milliseconds)	1,375
Superstep 22 (milliseconds)	1,159
Vertex input superstep (milliseconds)	11,644
Superstep 25 (milliseconds)	1,058
Superstep 17 (milliseconds)	1,075
Superstep 26 (milliseconds)	1,051
Superstep 12 (milliseconds)	2,342
Superstep 10 (milliseconds)	3,192
Superstep 19 (milliseconds)	1,092
Superstep 11 (milliseconds)	2,533
Superstep 30 (milliseconds)	1,126
Superstep 0 (milliseconds)	821
Superstep 28 (milliseconds)	1,184
Superstep 29 (milliseconds)	1,116
Superstep 2 (milliseconds)	1,165
Superstep 13 (milliseconds)	1,983
{code}

And with 5 partitions out-of-core:

{code}
hadoop jar giraph-249.jar org.apache.giraph.benchmark.ShortestPathsBenchmark -Dgiraph.useNetty=true
-Dgiraph.useOutOfCoreGraph=true -Dgiraph.maxPartitionsInMemory=5 -v -V 500000 -e 100 -w 10
{code}

{code}
Superstep 3 (milliseconds)	27,407
Superstep 4 (milliseconds)	26,620
Superstep 23 (milliseconds)	20,906
Superstep 20 (milliseconds)	21,324
Superstep 31 (milliseconds)	21,055
Setup (milliseconds)	2,639
Superstep 7 (milliseconds)	25,819
Superstep 27 (milliseconds)	20,790
Shutdown (milliseconds)	175
Superstep 16 (milliseconds)	21,434
Superstep 8 (milliseconds)	24,434
Superstep 9 (milliseconds)	24,183
Superstep 14 (milliseconds)	22,401
Superstep 6 (milliseconds)	25,948
Superstep 24 (milliseconds)	20,968
Superstep 18 (milliseconds)	21,179
Superstep 5 (milliseconds)	27,134
Superstep 1 (milliseconds)	20,315
Superstep 21 (milliseconds)	21,442
Total (milliseconds)	729,459
Superstep 15 (milliseconds)	22,198
Superstep 22 (milliseconds)	20,875
Vertex input superstep (milliseconds)	19,595
Superstep 25 (milliseconds)	20,829
Superstep 17 (milliseconds)	21,617
Superstep 12 (milliseconds)	22,548
Superstep 26 (milliseconds)	20,763
Superstep 19 (milliseconds)	21,302
Superstep 10 (milliseconds)	23,823
Superstep 11 (milliseconds)	22,908
Superstep 0 (milliseconds)	10,836
Superstep 30 (milliseconds)	21,014
Superstep 28 (milliseconds)	21,109
Superstep 29 (milliseconds)	21,158
Superstep 13 (milliseconds)	21,974
Superstep 2 (milliseconds)	20,726
{code}

So yeah, this is more like an order of magnitude. Of course this is nothing scientific.
                
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
>                 Key: GIRAPH-249
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-249
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch,
GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping the whole
graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of memory, while
gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate issue, although
the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job (albeit slowly)
instead of failing when the graph is too big, while still encouraging memory optimizations
and high-memory clusters; or restructuring Giraph to be as efficient as possible in disk mode,
making it almost a standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message