giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-683) Jython for Computation
Date Thu, 20 Jun 2013 18:50:20 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689514#comment-13689514
] 

Hudson commented on GIRAPH-683:
-------------------------------

Integrated in Giraph-trunk-Commit #1010 (See [https://builds.apache.org/job/Giraph-trunk-Commit/1010/])
    GIRAPH-683: Jython for Computation (nitay) (Revision 8f89bd85a03a1fec25e21e334631931f69078040)

     Result = SUCCESS
nitay : http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=8f89bd85a03a1fec25e21e334631931f69078040
Files : 
* giraph-core/src/main/java/org/apache/giraph/conf/TypesHolder.java
* giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java
* giraph-core/src/test/java/org/apache/giraph/jython/TestJython.java
* giraph-core/src/main/java/org/apache/giraph/conf/StrConfOption.java
* giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java
* giraph-examples/src/test/java/org/apache/giraph/TestBspBasic.java
* giraph-core/src/main/java/org/apache/giraph/utils/FileUtils.java
* giraph-core/src/main/java/org/apache/giraph/io/formats/IntIntNullTextInputFormat.java
* README
* giraph-core/src/main/java/org/apache/giraph/jython/JythonComputationFactory.java
* giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java
* giraph-core/src/main/java/org/apache/giraph/conf/ClassConfOption.java
* CHANGELOG
* giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java
* giraph-core/src/main/java/org/apache/giraph/utils/ReflectionUtils.java
* giraph-examples/src/test/java/org/apache/giraph/TestComputationState.java
* giraph-core/src/main/java/org/apache/giraph/conf/LongConfOption.java
* giraph-core/src/main/java/org/apache/giraph/utils/DistributedCacheUtils.java
* giraph-core/src/main/java/org/apache/giraph/conf/AbstractConfOption.java
* giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java
* giraph-core/src/main/java/org/apache/giraph/graph/Language.java
* pom.xml
* giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java
* giraph-core/src/test/java/org/apache/giraph/BspCase.java
* giraph-core/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java
* giraph-core/src/main/java/org/apache/giraph/conf/BooleanConfOption.java
* giraph-core/src/main/java/org/apache/giraph/conf/GiraphTypes.java
* giraph-examples/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java
* giraph-core/src/test/java/org/apache/giraph/utils/TestReflectionUtils.java
* giraph-core/src/main/java/org/apache/giraph/conf/ConfOptionType.java
* giraph-examples/src/test/java/org/apache/giraph/TestGraphPartitioner.java
* giraph-core/src/main/java/org/apache/giraph/graph/ComputationFactory.java
* giraph-core/src/main/java/org/apache/giraph/io/formats/LongLongNullTextInputFormat.java
* giraph-core/src/main/java/org/apache/giraph/benchmark/BenchmarkOption.java
* giraph-core/src/main/java/org/apache/giraph/jython/package-info.java
* giraph-core/src/main/java/org/apache/giraph/jython/JythonUtils.java
* giraph-core/src/main/java/org/apache/giraph/jython/DeployType.java
* giraph-core/pom.xml
* giraph-core/src/main/resources/org/apache/giraph/benchmark/page-rank.py
* giraph-core/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
* giraph-core/src/main/java/org/apache/giraph/master/SuperstepClasses.java
* giraph-core/src/main/java/org/apache/giraph/conf/FloatConfOption.java
* giraph-core/src/main/java/org/apache/giraph/conf/AllOptions.java
* giraph-core/src/test/resources/org/apache/giraph/jython/count-edges.py
* giraph-core/src/main/java/org/apache/giraph/utils/ConfigurationUtils.java
* giraph-core/src/main/java/org/apache/giraph/conf/EnumConfOption.java
* giraph-core/src/main/java/org/apache/giraph/graph/Computation.java
* giraph-core/src/main/java/org/apache/giraph/conf/IntConfOption.java
* giraph-core/src/main/java/org/apache/giraph/graph/DefaultComputationFactory.java
* giraph-core/src/main/java/org/apache/giraph/job/GiraphConfigurationValidator.java

                
> Jython for Computation
> ----------------------
>
>                 Key: GIRAPH-683
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-683
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Nitay Joffe
>            Assignee: Nitay Joffe
>
> Support for writing Computation code in Python. We add Jython bindings so that the Python
computation code can communicate back with the Java Giraph classes.
> To make this work I had to change a few parts of Giraph:
> 1) The Jython computation is not known until we read the script and create a Computation
object for it at runtime. This has to be done on each worker separately after the job has
launched. Because of this, there is no Computation class set at the beginning. I suspect other
scripting languages will have similar issue. To fix this I created a ComputationFactory interface
which is responsible for creating the Computation, with a default that just grabs the class
from the Configuration and creates it.
> 2) I created a GiraphTypes class to hold the I,V,E,M1,M2 classes. There was a lot of
repetitive code around these things so centralizing it all in one place made things a lot
cleaner.
> 3) I added some more helpers like isDefaultValue() to our conf options. Also added EnumConfOption.
> 4) The ReflectionUtils type inference was broken for interfaces. I fixed it by putting
in TypeTools, a library that does it better.
> 5) I added a TypesHolder interface (with help of [4]) that people can extend to describe
types used. Computation implements this. I use this with Jython so that user can provide something
that describes types but without requiring any methods.
> 6) Fixed GraphConfigurationValidator with interfaces and cleaned it up.
> To use Jython all the user has to do is call JythonUtils#init(...) somewhere in his initialization.
> I also added it to GiraphRunner. To use it through that you give an HDFS path to the
python file as the Computation. It takes a little more work because you need to also supply
the new options --typesHolder and --jythonClass.
> This patch contains our page rank benchmark implementation in Jython. I added an option
(--jython) which chooses whether to run the default or the jython version.
> Here is the initial PageRankBenchmark comparison (200 workers, 1B vertices, 200 edges
per vertex):
> Java:
> Total (milliseconds)	1,702,429	0	1,702,429
> Superstep 3 (milliseconds)	316,844	0	316,844
> Setup (milliseconds)	13,226	0	13,226
> Shutdown (milliseconds)	113	0	113
> Superstep 0 (milliseconds)	300,950	0	300,950
> Superstep 4 (milliseconds)	318,627	0	318,627
> Input superstep (milliseconds)	114,673	0	114,673
> Superstep 5 (milliseconds)	7,898	0	7,898
> Superstep 2 (milliseconds)	312,152	0	312,152
> Superstep 1 (milliseconds)	317,942	0	317,942
> Jython:
> Total (milliseconds)	2,123,228	0	2,123,228
> Superstep 3 (milliseconds)	406,422	0	406,422
> Setup (milliseconds)	7,159	0	7,159
> Shutdown (milliseconds)	131	0	131
> Superstep 0 (milliseconds)	347,732	0	347,732
> Superstep 4 (milliseconds)	405,696	0	405,696
> Input superstep (milliseconds)	112,645	0	112,645
> Superstep 5 (milliseconds)	46,687	0	46,687
> Superstep 2 (milliseconds)	410,349	0	410,349
> Superstep 1 (milliseconds)	386,404	0	386,404
> That's a mere 25% overhead.
> Take a look at the reviewboard for latest patch: https://reviews.apache.org/r/11709/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message