zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Powell Molleti <pmoll...@vmware.com>
Subject Re: quorum connection manager shutdown takes long time
Date Tue, 01 Sep 2015 21:49:57 GMT
Apologies for not posting the link to the old thread, here it is:
http://bit.ly/1JAaJaJ

Thanks
Powell.

On 8/31/15, 2:34 PM, "Powell Molleti" <pmolleti@vmware.com> wrote:

>In reference to:
>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jir
>a_browse_ZOOKEEPER-2D2246&d=BQIFAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNt
>Xt-uEs&r=yJGBUr8YNYcKMSgrAENRm8UHFXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_op
>YW1s-OXb2MVJaveBSbPqIFQw&s=UVM1pPxP0lnSUZGXwuC4jgmqh82pMqRdHJTXWKjy7pQ&e=
>
>Plainly removing  sock.setSoTimeout(0) from
>https://urldefense.proofpoint.com/v2/url?u=http-3A__s.apache.org_TfI&d=BQI
>FAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=yJGBUr8YNYcKMSgrAENRm8
>UHFXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_opYW1s-OXb2MVJaveBSbPqIFQw&s=Sddv
>lzYICW65qMs-kxwcASfZGRMQKh_67Ot4EpzPW4k&e=  has the unintended
>consequence of shutting down both the RecvWorker and SendWorker threads
>for all cases. Seems like current code is designed to  keep the socket
>alive (and threads to keep running) so as to reuse this channel to
>communicate again with the the peer node which still alive but needs to
>redo leader election.
>
>I could not reproduce any issue if threads shutdown after the timeout
>since new threads are created for next iteration of leader election. I
>rather would like to reuse the threads and the channel hence I propose
>the following approach.
>
>The alternative I suggest is to still remove setSoTimeout(0) from here:
>https://urldefense.proofpoint.com/v2/url?u=http-3A__s.apache.org_TfI&d=BQI
>FAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=yJGBUr8YNYcKMSgrAENRm8
>UHFXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_opYW1s-OXb2MVJaveBSbPqIFQw&s=Sddv
>lzYICW65qMs-kxwcASfZGRMQKh_67Ot4EpzPW4k&e=   , also enable SO_KEEPALIVE
>via setKeepAlive() on this socket and do not consider it an error when
>timeout occurs here:
>https://urldefense.proofpoint.com/v2/url?u=http-3A__bit.ly_1JHIdVY&d=BQIFA
>w&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=yJGBUr8YNYcKMSgrAENRm8UH
>FXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_opYW1s-OXb2MVJaveBSbPqIFQw&s=ktRCMe
>jYwu8LPG_s1B6_rlPeoZFTNj8PrRET3yEAg6A&e=  but consider it an error when
>it happens here: 
>https://urldefense.proofpoint.com/v2/url?u=http-3A__bit.ly_1NTjQ9R&d=BQIFA
>w&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=yJGBUr8YNYcKMSgrAENRm8UH
>FXYvY5J31UIvOjn58UU&m=7rVn1QkiMOK6B21p_opYW1s-OXb2MVJaveBSbPqIFQw&s=jUAFeY
>zMBnBkanBaYzZ8blViliOscQ4eSd0xm7FYb9g&e=
>
>This means that users can play with keep alive timeouts for TCP sockets
>to quicken TCP socket failures propagating to user-space and zookeeper
>also resets the socket if it detects other side is not responding when it
>knows it needs a response within some bounded time.
>
>Ideally I wish there is some userspace pings of every socket channel
>between zookeeper nodes to detect dead channels quickly. Seems like one
>exists for sockets that do Follow/Lead after leader election is done but
>not for this?. Such a feature could be added with care towards making it
>backward compatible.
>
>I posted the above text to Jira. Also please point out any wrong
>assumptions I have made and provide comments and suggestions.
>
>Thanks
>Powell.
>
>
>> From Raúl Gutiérrez Segalés <...@itevenworks.net>
>> Subject Re: quorum connection manager shutdown takes long time
>> Date Thu, 10 Jul 2014 18:02:37 GMT
>> On 9 July 2014 08:28, Michi Mutsuzaki <michi@cs.stanford.edu> wrote:
>
>>> I don't know how I missed that :) QA said this is reproducible, so
>>> I'll try commenting this line out. Thanks Flavio!
>>>
>
>> I am curious, was it that?
>> -rgs
>


Mime
View raw message