hadoop-zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vishal K (JIRA)" <j...@apache.org>
Subject [jira] Commented: (ZOOKEEPER-900) FLE implementation should be improved to use non-blocking sockets
Date Wed, 17 Nov 2010 15:51:14 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932995#action_12932995

Vishal K commented on ZOOKEEPER-900:

Hi Flavio,

Thanks. I will take shutting down of worker threads in a separate

I am cleaning up QCM further to incorporate the change mentioned in my
comment on 02/Nov/10 02:09 PM.

I have a few more questions:

1. I have a question about the following piece of code in QCM:

       if (remoteSid == QuorumPeer.OBSERVER_ID) {
             * Choose identifier at random. We need a value to identify
             * the connection.
            remoteSid = observerCounter--;
            LOG.info("Setting arbitrary identifier to observer: " + remoteSid);

Should we allow this? The problem with this code is that if a peer
connects twice with QuorumPeer.OBSERVER_ID, we will end up creating
threads for this peer twice. This could result in redundant
SendWorker/RecvWorker threads.

I haven't used observers yet. The documentation
says that just like followers, observers should have server IDs. In
which case, why do we want to provide a wild-card?

2. Should I add a check to reject connections from peers that are not
listed in the configuration file? Currently, we are not doing any
sanity check for server IDs. I think this might fix ZOOKEEPER-851.
The fix is simple. However, I am not sure if anyone in community
is relying on this ability.


> FLE implementation should be improved to use non-blocking sockets
> -----------------------------------------------------------------
>                 Key: ZOOKEEPER-900
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-900
>             Project: Zookeeper
>          Issue Type: Bug
>            Reporter: Vishal K
>            Assignee: Vishal K
>            Priority: Critical
>             Fix For: 3.4.0
>         Attachments: ZOOKEEPER-900.patch, ZOOKEEPER-900.patch1, ZOOKEEPER-900.patch2
> From earlier email exchanges:
> 1. Blocking connects and accepts:
> a) The first problem is in manager.toSend(). This invokes connectOne(), which does a
blocking connect. While testing, I changed the code so that connectOne() starts a new thread
called AsyncConnct(). AsyncConnect.run() does a socketChannel.connect(). After starting AsyncConnect,
connectOne starts a timer. connectOne continues with normal operations if the connection is
established before the timer expires, otherwise, when the timer expires it interrupts AsyncConnect()
thread and returns. In this way, I can have an upper bound on the amount of time we need to
wait for connect to succeed. Of course, this was a quick fix for my testing. Ideally, we should
use Selector to do non-blocking connects/accepts. I am planning to do that later once we at
least have a quick fix for the problem and consensus from others for the real fix (this problem
is big blocker for us). Note that it is OK to do blocking IO in SenderWorker and RecvWorker
threads since they block IO to the respective !
> b) The blocking IO problem is not just restricted to connectOne(), but also in receiveConnection().
The Listener thread calls receiveConnection() for each incoming connection request. receiveConnection
does blocking IO to get peer's info (s.read(msgBuffer)). Worse, it invokes connectOne() back
to the peer that had sent the connection request. All of this is happening from the Listener.
In short, if a peer fails after initiating a connection, the Listener thread won't be able
to accept connections from other peers, because it would be stuck in read() or connetOne().
Also the code has an inherent cycle. initiateConnection() and receiveConnection() will have
to be very carefully synchronized otherwise, we could run into deadlocks. This code is going
to be difficult to maintain/modify.
> Also see: https://issues.apache.org/jira/browse/ZOOKEEPER-822

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message