giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nitay Joffe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (GIRAPH-683) Jython for Computation
Date Mon, 10 Jun 2013 21:52:20 GMT

     [ https://issues.apache.org/jira/browse/GIRAPH-683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Nitay Joffe updated GIRAPH-683:
-------------------------------

    Description: 
Support for writing Computation code in Python. We add Jython bindings so that the Python
computation code can communicate back with the Java Giraph classes.

To make this work I had to change a few parts of Giraph:
1) The Jython computation is not known until we read the script and create a Computation object
for it at runtime. This has to be done on each worker separately after the job has launched.
Because of this, there is no Computation class set at the beginning. I suspect other scripting
languages will have similar issue. To fix this I created a ComputationFactory interface which
is responsible for creating the Computation, with a default that just grabs the class from
the Configuration and creates it.
2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a lot of repetitive
code around these things so centralizing it all in one place made things a lot cleaner.
3) I added some more helpers like isDefaultValue() to our conf options.

To use Jython all the user has to do is call Jython#init(...) somewhere in his initialization.

This patch contains our page rank benchmark implementation in Jython. I added an option (--jython)
which chooses whether to run the default or the jython version.

Here is the initial PageRankBenchmark comparison (4 workers, 10M vertices, 25 edges per vertex):

Java:
Total (milliseconds)	104,388	0	104,388
Superstep 3 (milliseconds)	16,750	0	16,750
Setup (milliseconds)	2,895	0	2,895
Shutdown (milliseconds)	50	0	50
Superstep 0 (milliseconds)	15,838	0	15,838
Superstep 4 (milliseconds)	19,088	0	19,088
Input superstep (milliseconds)	8,700	0	8,700
Superstep 5 (milliseconds)	3,550	0	3,550
Superstep 2 (milliseconds)	17,905	0	17,905
Superstep 1 (milliseconds)	19,608	0	19,608

Jython:
Total (milliseconds)	244,965	0	244,965
Superstep 3 (milliseconds)	43,405	0	43,405
Setup (milliseconds)	3,735	0	3,735
Shutdown (milliseconds)	117	0	117
Superstep 0 (milliseconds)	36,962	0	36,962
Superstep 4 (milliseconds)	46,088	0	46,088
Input superstep (milliseconds)	8,551	0	8,551
Superstep 5 (milliseconds)	22,040	0	22,040
Superstep 2 (milliseconds)	42,329	0	42,329
Superstep 1 (milliseconds)	41,737	0	41,737

Overhead of Jython vs Java = 2.5x.


However at scale things get better (200 workers, 1B vertices, 200 edges per vertex):

Java:
Total (milliseconds)	1,702,429	0	1,702,429
Superstep 3 (milliseconds)	316,844	0	316,844
Setup (milliseconds)	13,226	0	13,226
Shutdown (milliseconds)	113	0	113
Superstep 0 (milliseconds)	300,950	0	300,950
Superstep 4 (milliseconds)	318,627	0	318,627
Input superstep (milliseconds)	114,673	0	114,673
Superstep 5 (milliseconds)	7,898	0	7,898
Superstep 2 (milliseconds)	312,152	0	312,152
Superstep 1 (milliseconds)	317,942	0	317,942

Jython:
Total (milliseconds)	2,123,228	0	2,123,228
Superstep 3 (milliseconds)	406,422	0	406,422
Setup (milliseconds)	7,159	0	7,159
Shutdown (milliseconds)	131	0	131
Superstep 0 (milliseconds)	347,732	0	347,732
Superstep 4 (milliseconds)	405,696	0	405,696
Input superstep (milliseconds)	112,645	0	112,645
Superstep 5 (milliseconds)	46,687	0	46,687
Superstep 2 (milliseconds)	410,349	0	410,349
Superstep 1 (milliseconds)	386,404	0	386,404

That's a mere 25% overhead.

Take a look at the reviewboard for latest patch: https://reviews.apache.org/r/11709/

  was:
Support for writing Computation code in Python. We add Jython bindings so that the Python
computation code can communicate back with the Java Giraph classes.

To make this work I had to change a few parts of Giraph:
1) The Jython computation is not known until we read the script and create a Computation object
for it at runtime. This has to be done on each worker separately after the job has launched.
Because of this, there is no Computation class set at the beginning. I suspect other scripting
languages will have similar issue. To fix this I created a ComputationFactory interface which
is responsible for creating the Computation, with a default that just grabs the class from
the Configuration and creates it.
2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a lot of repetitive
code around these things so centralizing it all in one place made things a lot cleaner.
3) I added some more helpers like isDefaultValue() to our conf options.

To use Jython all the user has to do is call Jython#init(...) somewhere in his initialization.

This patch contains our page rank benchmark implementation in Jython. I added an option (--jython)
which chooses whether to run the default or the jython version.

Here is the initial PageRankBenchmark comparison (4 workers, 10M vertices, 25 edges per vertex):

Java:
Total (milliseconds)	104,388	0	104,388
Superstep 3 (milliseconds)	16,750	0	16,750
Setup (milliseconds)	2,895	0	2,895
Shutdown (milliseconds)	50	0	50
Superstep 0 (milliseconds)	15,838	0	15,838
Superstep 4 (milliseconds)	19,088	0	19,088
Input superstep (milliseconds)	8,700	0	8,700
Superstep 5 (milliseconds)	3,550	0	3,550
Superstep 2 (milliseconds)	17,905	0	17,905
Superstep 1 (milliseconds)	19,608	0	19,608


Jython:
Total (milliseconds)	244,965	0	244,965
Superstep 3 (milliseconds)	43,405	0	43,405
Setup (milliseconds)	3,735	0	3,735
Shutdown (milliseconds)	117	0	117
Superstep 0 (milliseconds)	36,962	0	36,962
Superstep 4 (milliseconds)	46,088	0	46,088
Input superstep (milliseconds)	8,551	0	8,551
Superstep 5 (milliseconds)	22,040	0	22,040
Superstep 2 (milliseconds)	42,329	0	42,329
Superstep 1 (milliseconds)	41,737	0	41,737


So the initial overhead of Jython vs Java is around 2.5x.

Take a look at the reviewboard for latest patch: https://reviews.apache.org/r/11709/

    
> Jython for Computation
> ----------------------
>
>                 Key: GIRAPH-683
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-683
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Nitay Joffe
>            Assignee: Nitay Joffe
>
> Support for writing Computation code in Python. We add Jython bindings so that the Python
computation code can communicate back with the Java Giraph classes.
> To make this work I had to change a few parts of Giraph:
> 1) The Jython computation is not known until we read the script and create a Computation
object for it at runtime. This has to be done on each worker separately after the job has
launched. Because of this, there is no Computation class set at the beginning. I suspect other
scripting languages will have similar issue. To fix this I created a ComputationFactory interface
which is responsible for creating the Computation, with a default that just grabs the class
from the Configuration and creates it.
> 2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a lot of
repetitive code around these things so centralizing it all in one place made things a lot
cleaner.
> 3) I added some more helpers like isDefaultValue() to our conf options.
> To use Jython all the user has to do is call Jython#init(...) somewhere in his initialization.
> This patch contains our page rank benchmark implementation in Jython. I added an option
(--jython) which chooses whether to run the default or the jython version.
> Here is the initial PageRankBenchmark comparison (4 workers, 10M vertices, 25 edges per
vertex):
> Java:
> Total (milliseconds)	104,388	0	104,388
> Superstep 3 (milliseconds)	16,750	0	16,750
> Setup (milliseconds)	2,895	0	2,895
> Shutdown (milliseconds)	50	0	50
> Superstep 0 (milliseconds)	15,838	0	15,838
> Superstep 4 (milliseconds)	19,088	0	19,088
> Input superstep (milliseconds)	8,700	0	8,700
> Superstep 5 (milliseconds)	3,550	0	3,550
> Superstep 2 (milliseconds)	17,905	0	17,905
> Superstep 1 (milliseconds)	19,608	0	19,608
> Jython:
> Total (milliseconds)	244,965	0	244,965
> Superstep 3 (milliseconds)	43,405	0	43,405
> Setup (milliseconds)	3,735	0	3,735
> Shutdown (milliseconds)	117	0	117
> Superstep 0 (milliseconds)	36,962	0	36,962
> Superstep 4 (milliseconds)	46,088	0	46,088
> Input superstep (milliseconds)	8,551	0	8,551
> Superstep 5 (milliseconds)	22,040	0	22,040
> Superstep 2 (milliseconds)	42,329	0	42,329
> Superstep 1 (milliseconds)	41,737	0	41,737
> Overhead of Jython vs Java = 2.5x.
> However at scale things get better (200 workers, 1B vertices, 200 edges per vertex):
> Java:
> Total (milliseconds)	1,702,429	0	1,702,429
> Superstep 3 (milliseconds)	316,844	0	316,844
> Setup (milliseconds)	13,226	0	13,226
> Shutdown (milliseconds)	113	0	113
> Superstep 0 (milliseconds)	300,950	0	300,950
> Superstep 4 (milliseconds)	318,627	0	318,627
> Input superstep (milliseconds)	114,673	0	114,673
> Superstep 5 (milliseconds)	7,898	0	7,898
> Superstep 2 (milliseconds)	312,152	0	312,152
> Superstep 1 (milliseconds)	317,942	0	317,942
> Jython:
> Total (milliseconds)	2,123,228	0	2,123,228
> Superstep 3 (milliseconds)	406,422	0	406,422
> Setup (milliseconds)	7,159	0	7,159
> Shutdown (milliseconds)	131	0	131
> Superstep 0 (milliseconds)	347,732	0	347,732
> Superstep 4 (milliseconds)	405,696	0	405,696
> Input superstep (milliseconds)	112,645	0	112,645
> Superstep 5 (milliseconds)	46,687	0	46,687
> Superstep 2 (milliseconds)	410,349	0	410,349
> Superstep 1 (milliseconds)	386,404	0	386,404
> That's a mere 25% overhead.
> Take a look at the reviewboard for latest patch: https://reviews.apache.org/r/11709/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message