spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sarutak <...@git.apache.org>
Subject [GitHub] spark pull request: [WIP][SPARK-2677]BasicBlockFetchIterator#next ...
Date Tue, 29 Jul 2014 07:08:09 GMT
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/1619#issuecomment-50442727
  
    @witgo @pwendell I have already noticed there is not a configuration for timeout for ConnectionManager,
but the timeout for ConnectionManager does not resolve this issue because the channel used
by receiving ack is implemented as non blocking I.O and SO_TIMEOUT is effects read after establishing
connection. So, if remote executor hangs, it cannot establish connections with fetching executors.
    
    Additionally, BasicBlockFetcherIterator is wait on LinkedBlockingQueue#take (result.take)
so we should set FetchResult object which size is -1 to result queue of BasicBlockFetcherIterator.
    (FetchResult which size is -1 means fetch failed)
    
    I think remote errors can be classified following 2 cases.
    
    1) Remote Executor hang
    In this case, we need timeout for Fetch Request (Not read timeout)
    I'm trying to resolve this case in https://github.com/apache/spark/pull/1632
    
    2) Remote Executor not hang but error occurred
    In this case, remote executor should send message which means error occurred in remote
Executor.
    I'm trying to resolve this case in https://github.com/apache/spark/pull/1490
    This is ongoing.
    Can anyone review this too? 
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message