cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Schröder <skro...@gmail.com>
Subject Re: Random slow connects.
Date Thu, 14 Jun 2012 15:43:18 GMT
Hi Mina,

The delay is not constant, in the absolute majority of cases, connecting is
almost instant, but occasionally, connecting to a server takes a few
seconds.

We can't even reproduce it reliably, we can see in our server logs that
sometimes, maybe a few times a day, maybe once every few days, a cassandra
server will be slow in accepting connections, and after a little while
everything will be ok again. It's not a network saturation error, it's not
a CPU saturation error. Not even GC pauses.

Has anyone else noticed something similar? Or is this simply a result of us
running a tight connection pool which recycles connections every few hours
and only waits a few seconds for a connection before timing out?


/Henrik

On Thu, Jun 14, 2012 at 4:54 PM, Mina Naguib
<mina.naguib@bloomdigital.com>wrote:

>
> On 2012-06-14, at 10:38 AM, Henrik Schröder wrote:
>
> > Hi everyone,
> >
> > We have problem with our Cassandra cluster, and that is that sometimes
> it takes several seconds to open a new Thrift connection to the server.
> We've had this issue when we ran on windows, and we have this issue now
> that we run on Ubuntu. We've had it with our old networking setup, and we
> have it with our new networking setup where we're running it over a
> dedicated gigabit network. Normally estabishing a new connection is
> instant, but once in a while it seems like it's not accepting any new
> connections until three seconds have passed.
> >
> > We're of course running a connection-pooling client which mitigates
> this, since once a connection is established, it's rock solid.
> >
> > We tried switching the rpc_server_type to hsha, but that seems to have
> made the problem worse, we're seeing more connection timeouts because of
> this.
> >
> > For what it's woth, we're running Cassandra version 1.0.10 on Ubuntu,
> and our connection pool is configured to abort a connection attempt after
> two seconds, and each connection lives for six hours and then it's
> recycled. Under current load we do about 500 writes/s and 100 reads/s, we
> have 20 clients, but each has a very small connection pool of maybe up to 5
> simultaneous connections against each Cassandra server. We see these
> connection issues maybe once a day, but always at random intervals.
> >
> > We've tried to get more information through Datastax Opscenter, the JMX
> console, and our own application monitoring and logging, but we can't see
> anything out of the ordinary. Sometimes, seemingly by random, it's just
> really slow to connect. We're all out of ideas. Does anyone here have
> suggestions on where to look and what to do next?
>
> Have you ironed out non-cassandra potential causes ?
>
> 3 seconds constantly sounds it could be a timeout/retry somewhere.  Do you
> contact cassandra via a hostname or IP address ?  If via hostname, iron out
> DNS.
>
> Either way, I'd fire up tcpdump, both on both the client and the server,
> and observe the TCP handshake.  Specifically see if the SYN packet is sent
> and received, whether the SYN-ACK is sent back right away and received, and
> final ACK.
>
> If that looks good, then TCP-wise you're in good shape and the problem is
> in a higher layer (thrift).  If not, see where the delay/drop/retry
> happens.  If it's in the first packet, it may be a networking/routing
> issue.  If in the second, it may me capacity at the server (investigate
> with lsof/netstat/JMX), etc..
>
>
>

Mime
View raw message