incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Shuler <mich...@pbandjelly.org>
Subject Re: binary protocol server side sockets
Date Wed, 09 Apr 2014 18:34:45 GMT
On 04/09/2014 12:41 PM, graham sanderson wrote:
> Michael, it is not that the connections are being dropped, it is that
> the connections are not being dropped.

Thanks for the clarification.

> These server side sockets are ESTABLISHED, even though the client
> connection on the other side of the network device is long gone. This
> may well be an issue with the network device (it is valiantly trying
> to keep the connection alive it seems).

Have you tested if they *ever* time out on their own, or do they just 
keep sticking around forever? (maybe 432000 sec (120 hours), which is 
the default for nf_conntrack_tcp_timeout_established?) Trying out all 
the usage scenarios is really the way to track it down - directly on 
switch, behind/in front of firewall, on/off the VPN.

> That said KEEPALIVE on the server side would not be a bad idea. At
> least then the OS on the server would eventually (probably after 2
> hours of inactivity) attempt to ping the client. At that point
> hopefully something interesting would happen perhaps causing an error
> and destroying the server side socket (note KEEPALIVE is also good
> for preventing idle connections from being dropped by other network
> devices along the way)

Tuning net.ipv4.tcp_keepalive_* could be helpful, if you know they 
timeout after 2 hours, which is the default.

> rpc_keepalive on the server sets keep alive on the server side
> sockets for thrift, and is true by default
>
> There doesn’t seem to be a setting for the native protocol
>
> Note this isn’t a huge issue for us, they can be cleaned up by a
> rolling restart, and this particular case is not production, but
> related to development/testing against alpha by people working
> remotely over VPN - and it may well be the VPNs fault in this case…
> that said and maybe this is a dev list question, it seems like the
> option to set keepalive should exist.

Yeah, but I agree you shouldn't have to restart to clean up connections 
- that's why I think it is lower in the network stack, and that a bit of 
troubleshooting and tuning might be helpful. That setting sounds like a 
good Jira request - keepalive may be the default, I'm not sure. :)

-- 
Michael

> On Apr 9, 2014, at 12:25 PM, Michael Shuler <michael@pbandjelly.org>
> wrote:
>
>> On 04/09/2014 11:39 AM, graham sanderson wrote:
>>> Thanks, but I would think that just sets keep alive from the
>>> client end; I’m talking about the server end… this is one of
>>> those issues where there is something (e.g. switch, firewall, VPN
>>> in between the client and the server) and we get left with
>>> orphaned established connections to the server when the client is
>>> gone.
>>
>> There would be no server setting for any service, not just c*, that
>> would correct mis-configured connection-assassinating network gear
>> between the client and server. Fix the gear to allow persistent
>> connections.
>>
>> Digging through the various timeouts in c*.yaml didn't lead me to a
>> simple answer for something tunable, but I think this may be more
>> basic networking related. I believe it's up to the client to keep
>> the connection open as Duy indicated. I don't think c* will
>> arbitrarily sever connections - something that disconnects the
>> client may happen. In that case, the TCP connection on the server
>> should drop to TIME_WAIT. Is this what you are seeing in `netstat
>> -a` on the server - a bunch of TIME_WAIT connections hanging
>> around? Those should eventually be recycled, but that's tunable in
>> the network stack, if they are being generated at a high rate.
>>
>> -- Michael
>


Mime
View raw message