giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavan Kumar Athivarapu" <pava...@outlook.com>
Subject Re: Review Request 21987: Detect crashes of Netty threads
Date Tue, 10 Jun 2014 00:14:02 GMT


> On May 29, 2014, 6:40 p.m., Pavan Kumar Athivarapu wrote:
> > giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ExceptionHandler.java,
line 29
> > <https://reviews.apache.org/r/21987/diff/1/?file=597792#file597792line29>
> >
> >     this catches exceptions only for inbound conenctions, do u want to do this for
outbound as well.
> >     
> >     in that case would it not be better to just piggy back on the counters &
override exceptioncaught in those methods?
> >     
> >     I am fine with creating new classes or just overriding the method in counters
> 
> Pavan Kumar Athivarapu wrote:
>     one more thing (for exception handling) - inbound handler should be at the end and
outbound handler should be at the beginning of the pipeline 
>     so u catch exceptions at the end of processing input - 1
>     u catch exceptions at the end of sending output - 2
> 
> Sergey Edunov wrote:
>     I felt like it's nice to have a separate class for exception handling as it outlines
it's intention and makes it clear how and when to use it.

even the new diff does not handle exceptions for outbound
also please name is the handler to something like "serverexceptionHandler", etc. instead of
just "exception"


- Pavan Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21987/#review44270
-----------------------------------------------------------


On June 5, 2014, 10:26 p.m., Sergey Edunov wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21987/
> -----------------------------------------------------------
> 
> (Updated June 5, 2014, 10:26 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> When some of the request processing threads fails, the worker gets stuck but the job
doesn't fail and it has to be killed manually. We should detect netty thread crashes and fail
the job automatically.
> 
> 
> Diffs
> -----
> 
>   findbugs-exclude.xml e0466f7 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java ae40c3b 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterClient.java c982209

>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyMasterServer.java cb36c3e

>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyServer.java 14d4ea8 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java 7541418

>   giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java adb96cb

>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/ExceptionHandler.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/comm/netty/handler/RequestServerHandler.java
601cd2f 
>   giraph-core/src/main/java/org/apache/giraph/graph/GraphMapper.java c86a024 
>   giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java ad5fc91 
>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 90dc9f3 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java aff7084 
>   giraph-core/src/main/java/org/apache/giraph/yarn/GiraphYarnTask.java f4719cc 
>   giraph-core/src/test/java/org/apache/giraph/comm/ConnectionTest.java e771e36 
>   giraph-core/src/test/java/org/apache/giraph/comm/MockExceptionHandler.java PRE-CREATION

>   giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 236bc88 
>   giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java fcdfa5c 
>   giraph-core/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java c026cf8 
> 
> Diff: https://reviews.apache.org/r/21987/diff/
> 
> 
> Testing
> -------
> 
> Run some production jobs with this change. 
> Also introduced random bugs in deserialization logic and confirmed that job fails. 
> 
> 
> Thanks,
> 
> Sergey Edunov
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message