hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-642) Make GraphRunner disk based
Date Fri, 19 Oct 2012 17:18:12 GMT

    [ https://issues.apache.org/jira/browse/HAMA-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480144#comment-13480144

Thomas Jungblut commented on HAMA-642:

Yet another benchmark:

// writing 1gb random, 512k buffer size

Written 1024mb in 24612ms! That is 41,61mb/s!
Read 1024mb in 8701ms! That is 117,69mb/s!
Written 1024mb in 6625ms! That is 154,57mb/s!
Read 1024mb in 9424ms! That is 108,66mb/s!
Written 1024mb in 6674ms! That is 153,43mb/s!
Read 1024mb in 9479ms! That is 108,03mb/s!
Written 1024mb in 6775ms! That is 151,14mb/s!
Read 1024mb in 9294ms! That is 110,18mb/s!

//512mb random, 512k buffer size

Written 512mb in 12325ms! That is 41,54mb/s!
Read 512mb in 6758ms! That is 75,76mb/s!
Written 512mb in 3346ms! That is 153,02mb/s!
Read 512mb in 4521ms! That is 113,25mb/s!
Written 512mb in 3287ms! That is 155,77mb/s!
Read 512mb in 4538ms! That is 112,83mb/s!
Written 512mb in 3293ms! That is 155,48mb/s!
Read 512mb in 4522ms! That is 113,22mb/s!


You see that in the first iterations, JIT is warming up. In the last 2 iterations I see a
very nice disk saturation. It is only 5mb/s slower than a C program. I'm really happy with
the performance now. 
> Make GraphRunner disk based
> ---------------------------
>                 Key: HAMA-642
>                 URL: https://issues.apache.org/jira/browse/HAMA-642
>             Project: Hama
>          Issue Type: Improvement
>          Components: graph
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Thomas Jungblut
>         Attachments: HAMA-642_unix_1.patch, HAMA-642_unix_2.patch, HAMA-642_unix_3.patch,
HAMA-642_unix_4.patch, HAMA-scale_1.patch, HAMA-scale_2.patch, HAMA-scale_3.patch, HAMA-scale_4.patch
> To improve scalability we can improve the graph runner to be disk based.
> Which basically means:
> - We have just a single Vertex instance that get's refilled.
> - We directly write vertices to disk after partitioning
> - In every superstep we iterate over the vertices on disk, fill the vertex instance and
call the users compute functions
> Problems:
> - State other than vertex value can't be stored easy
> - How do we deal with random access after messages have arrived?
> So I think we should make the graph runner more hybrid, like using the queues we have
implemented in the messaging. So the graphrunner can be configured to run completely on disk,
in cached mode or in in-memory mode.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message