incubator-s4-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (S4-7) Netty to tolerate network glitches and connection loss
Date Wed, 08 Feb 2012 21:30:59 GMT

    [ https://issues.apache.org/jira/browse/S4-7?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204014#comment-13204014
] 

Karthik Kambatla commented on S4-7:
-----------------------------------

Update on the synchronization issues in TCPEmitter - the working branch is at https://github.com/kambatla/s4/tree/S4-7

As Matthieu has pointed out earlier, there are some synchronization issues in the committed
patch. These are exposed by the MultiPartitionDeliveryTest where each partition sends messages
to every other partition. 

When we have a higher number of partitions running (say 6), occasionally, messages from a
partition A to a partition B end up at partition C. I suspect this happens because ClusterNode
C claims to be ClusterNode B. 

Even the UDPEmitter behaves weird with high enough number of partitions.

Kishore and Matthieu - any pointers as to how go about this?

Thanks
                
> Netty to tolerate network glitches and connection loss
> ------------------------------------------------------
>
>                 Key: S4-7
>                 URL: https://issues.apache.org/jira/browse/S4-7
>             Project: Apache S4
>          Issue Type: Bug
>            Reporter: Leo Neumeyer
>            Assignee: Karthik Kambatla
>             Fix For: 0.5
>
>         Attachments: S4-7-Robust-TCPEmitter-asynchronous-ordered.patch, s4-7.patch, s4-7.patch
>
>
> NettyEmitter connects to different partitions and creates channels over which it communicates
to other listeners.
> It suffers from the following issues -- 
> 1. If the underlying topology changes, the channels and the associated connections are
not updated.
> 2. If a connection gets disconnected, it stays disconnected.
> 3. If for any reason, a connection can't be made, send() drops the message to be sent.
> The solution is to - 
> 1. Maintain a bounded messageQueue for each destination partition - if a connection does
not exist, the message should be queued.
> 2. Maintain a map of the channel used for each destination partition - update this map
on changes to topology, or on send() in case of disconnections.
> 3. Every time a (re-)connection is made, send the queued messages first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message