incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: normal thread counts?
Date Tue, 30 Apr 2013 20:34:28 GMT
>  Many many many of the threads are trying to talk to IPs that aren't in the cluster (I
assume they are the IP's of dead hosts). 
Are these IP's from before the upgrade ? Are they IP's you expect to see ? 

Cross reference them with the output from nodetool gossipinfo to see why the node thinks they
should be used. 
Could you provide a list of the thread names ? 

One way to remove those IPs that may be to rolling restart with -Dcassandra.load_ring_state=false
i the JVM opts at the bottom of cassandra-env.sh

The OutboundTcpConnection threads are created in pairs by the OutboundTcpConnectionPool, which
is created here https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/MessagingService.java#L502
The threads are created in the OutboundTcpConnectionPool constructor checking to see if this
could be the source of the leak. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/05/2013, at 2:18 AM, William Oberman <oberman@civicscience.com> wrote:

> I use phpcassa.
> 
> I did a thread dump.  99% of the threads look very similar (I'm using 1.1.9 in terms
of matching source lines).  The thread names are all like this: "WRITE-/10.x.y.z".  There
are a LOT of duplicates (in terms of the same IP).  Many many many of the threads are trying
to talk to IPs that aren't in the cluster (I assume they are the IP's of dead hosts).  The
stack trace is basically the same for them all, attached at the bottom.   
> 
> There is a lot of things I could talk about in terms of my situation, but what I think
might be pertinent to this thread: I hit a "tipping point" recently and upgraded a 9 node
cluster from AWS m1.large to m1.xlarge (rolling, one at a time).  7 of the 9 upgraded fine
and work great.  2 of the 9 keep struggling.  I've replaced them many times now, each time
using this process:
> http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node
> And even this morning the only two nodes with a high number of threads are those two
(yet again).  And at some point they'll OOM.
> 
> Seems like there is something about my cluster (caused by the recent upgrade?) that causes
a thread leak on OutboundTcpConnection   But I don't know how to escape from the trap.  Any
ideas?
> 
> 
> --------
>   stackTrace = [ { 
>     className = sun.misc.Unsafe;
>     fileName = Unsafe.java;
>     lineNumber = -2;
>     methodName = park;
>     nativeMethod = true;
>    }, { 
>     className = java.util.concurrent.locks.LockSupport;
>     fileName = LockSupport.java;
>     lineNumber = 158;
>     methodName = park;
>     nativeMethod = false;
>    }, { 
>     className = java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject;
>     fileName = AbstractQueuedSynchronizer.java;
>     lineNumber = 1987;
>     methodName = await;
>     nativeMethod = false;
>    }, { 
>     className = java.util.concurrent.LinkedBlockingQueue;
>     fileName = LinkedBlockingQueue.java;
>     lineNumber = 399;
>     methodName = take;
>     nativeMethod = false;
>    }, { 
>     className = org.apache.cassandra.net.OutboundTcpConnection;
>     fileName = OutboundTcpConnection.java;
>     lineNumber = 104;
>     methodName = run;
>     nativeMethod = false;
>    } ];
> ----------
> 
> 
> 
> 
> On Mon, Apr 29, 2013 at 4:31 PM, aaron morton <aaron@thelastpickle.com> wrote:
>>  I used JMX to check current number of threads in a production cassandra machine,
and it was ~27,000.
> That does not sound too good. 
> 
> My first guess would be lots of client connections. What client are you using, does it
do connection pooling ?
> See the comments in cassandra.yaml around rpc_server_type, the default uses sync uses
one thread per connection, you may be better with HSHA. But if your app is leaking connection
you should probably deal with that first. 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 30/04/2013, at 3:07 AM, William Oberman <oberman@civicscience.com> wrote:
> 
>> Hi,
>> 
>> I'm having some issues.  I keep getting:
>> ------------
>> ERROR [GossipStage:1] 2013-04-28 07:48:48,876 AbstractCassandraDaemon.java (line
135) Exception in thread Thread[GossipStage:1,5,main]
>> java.lang.OutOfMemoryError: unable to create new native thread
>> --------------
>> after a day or two of runtime.  I've checked and my system settings seem acceptable:
>> memlock=unlimited
>> nofiles=100000
>> nproc=122944
>> 
>> I've messed with heap sizes from 6-12GB (15 physical, m1.xlarge in AWS), and I keep
OOM'ing with the above error.
>> 
>> I've found some (what seem to me) to be obscure references to the stack size interacting
with # of threads.  If I'm understanding it correctly, to reason about Java mem usage I have
to think of OS + Heap as being locked down, and the stack gets the "leftovers" of physical
memory and each thread gets a stack.
>> 
>> For me, the system ulimit setting on stack is 10240k (no idea if java sees or respects
this setting).  My -Xss for cassandra is the default (I hope, don't remember messing with
it) of 180k.  I used JMX to check current number of threads in a production cassandra machine,
and it was ~27,000.  Is that a normal thread count?  Could my OOM be related to stack + number
of threads, or am I overlooking something more simple?
>> 
>> will
>> 
> 
> 
> 
> 
> 


Mime
View raw message