giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-374) Multithreading in input split loading and compute
Date Wed, 17 Oct 2012 03:48:02 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Avery Ching updated GIRAPH-374:
-------------------------------

    Attachment: GIRAPH-374.2.patch

Addressed Maja's helpful comments.

https://reviews.apache.org/r/7613/
                
> Multithreading in input split loading and compute
> -------------------------------------------------
>
>                 Key: GIRAPH-374
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-374
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-374.2.patch
>
>
> Cleaned up the WorkerClient hierarchy
> - WorkerClientRequestProcessor is a request cache for every thread (input split loading
/ compute)
> - With RPC gone, got rid of ugly WorkerClientServer and NettyWorkerClientServer
> SendPartitionCache
> Made GraphState immutable for multi-threading
> Added multithreading for loading the input splits
> Added multithreading for compute
> Added thread-level debugging as an option
> Added additional testing on the number of vertices, edges
> Optimization on HashWorkerPartitioner to use CopyOnWriteArrayList instead of sychronized
list (this is a bottleneck)
> Added multithreaded TestPageRank test case
> I ran the PageRankBenchmark on 20 workers with 10M vertices, 1B edges.  All supersteps
are about the same time, so I just compared superstep 0 from every test.  Compute performance
gains are quite nice (even a little faster than before with one thread).  Actual gains will
depend heavily on the number of cores you have and possible parallelism of the application.
> {code}
> Trunk
> # threads  compute time (secs)   total time (secs)
> 1          89                    97.543
> Multithreading
> 1          86.70094              92.477
> 2          50.41521              57.850
> 4          38.07716              50.246
> 8          38.63188              45.940
> 16         22.999943             48.607
> 24         23.649189             45.112
> 32         21.412325             44.201
> {code}
> We also saw similar gains on the input split loading on an internal app. Future work
can be to further improve the scalability of multithreading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message