giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Presta" <alessan...@fb.com>
Subject Re: Review Request: GIRAPH-300 : Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning
Date Wed, 15 Aug 2012 18:56:57 GMT


> On Aug. 15, 2012, 2:03 p.m., Alessandro Presta wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java,
line 389
> > <https://reviews.apache.org/r/6600/diff/1/?file=139832#file139832line389>
> >
> >     This looks a bit weird (using a TimedLogger for the timing but doing the actual
logging on the raw Logger) although I see where the problem is (need to call it multiple times
without waiting for the next deadline).
> 
> Avery Ching wrote:
>     yeah, we can probably augment TimedLogger to have a level in a future patch.

Yeah, also making it a full-fledged Logger (with its own isInfoEnabled() for example) would
improve encapsulation. But this is very minor stuff.


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6600/#review10330
-----------------------------------------------------------


On Aug. 15, 2012, 6:52 p.m., Avery Ching wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6600/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2012, 6:52 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> * Upgrade to the most recent stable version of Netty (3.5.3.Final)
> * Try multiple connection attempts up to n failures
> * Track requests throughout the system by keeping track of the request id and then matching
the request id to the response (minor refactoring of WritableRequest to make requests simpler
and support the request id)
> * Improved handling of netty exceptions by dumping the exception stack to help debug
failures
> * Fixes bug in HashWorkerPartitioner by making partitionList thread-safe (this causes
divide by zero exceptions in real life)
> 
> 
> This addresses bug GIRAPH-300.
>     https://issues.apache.org/jira/browse/GIRAPH-300
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/pom.xml 1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyServer.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestInfo.java
PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestServerHandler.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ResponseClientHandler.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMutationsRequest.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/utils/TimedLogger.java
1372575 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
1372575 
> 
> Diff: https://reviews.apache.org/r/6600/diff/
> 
> 
> Testing
> -------
> 
> Currently, netty connection failures causes issues with more than 75 workers in my setup.
This allows us to reach over 200+ in a reasonably reliable network that doesn't kill connections.
> 
> This code passes the local Hadoop regressions and the single node Hadoop instance regressions.
It also succeeded on large runs (200+ workers) on a real Hadoop cluster.
> 
> 
> Thanks,
> 
> Avery Ching
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message