giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta" <alessan...@fb.com>
Subject Re: Review Request: GIRAPH-498: We should check input splits status from zookeeeper once per worker, not once per split thread
Date Tue, 05 Feb 2013 07:20:55 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9301/#review16087
-----------------------------------------------------------

Ship it!


I really like the refactoring, taking input split reservation logic out of the callables.
Also a good cleanup of InputSplitPathOrganizer.
This was long overdue and the benefits are evident.
+1


giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java
<https://reviews.apache.org/r/9301/#comment34405>

    I find the "should" form a bit exotic. I think "useInputSplitLocality" would be consistent
with the codebase.



giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java
<https://reviews.apache.org/r/9301/#comment34406>

    I also found extending Iterable a bit too implicit here. Good refactor.


- Alessandro Presta


On Feb. 5, 2013, 5:42 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9301/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2013, 5:42 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> When using a lot of workers and a lot of input split threads, checking that all input
splits are finished after the reading is done takes a long time, since we check every input
split once per thread.
> 
> 
> This addresses bug GIRAPH-498.
>     https://issues.apache.org/jira/browse/GIRAPH-498
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 796047d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java f542344 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 7d40dfb

>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java
1adcd73 
>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java bfaefd2

>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java d09ca2b

>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsHandler.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java a4f98e1

>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java
0d617dc 
>   giraph-core/src/test/java/org/apache/giraph/TestBspBasic.java 987f51c 
>   giraph-hbase/.graph.csv.crc PRE-CREATION 
>   giraph-hbase/graph.csv PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9301/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> 
> Real application, using 200 workers and 20 input threads:
> - trunk - about 560s for input split threads to finish, 720s for input superstep
> - with this patch - about 310s for input split threads to finish, 500s for input superstep
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message