cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivier Rosello <orose...@corp.free.fr>
Subject Re: High CPU usage on all nodes without any read or write
Date Mon, 12 Jul 2010 14:31:20 GMT
> > But in Cassandra output log :
> > root@cassandra-2:~#  tail -f /var/log/cassandra/output.log
> >  INFO 15:32:05,390 GC for ConcurrentMarkSweep: 1359 ms, 4295787600
> reclaimed leaving 1684169392 used; max is 6563430400
> >  INFO 15:32:09,875 GC for ConcurrentMarkSweep: 1363 ms, 4296991416
> reclaimed leaving 1684201560 used; max is 6563430400
> >  INFO 15:32:14,370 GC for ConcurrentMarkSweep: 1341 ms, 4295467880
> reclaimed leaving 1684879440 used; max is 6563430400
> >  INFO 15:32:18,906 GC for ConcurrentMarkSweep: 1343 ms, 4296386408
> reclaimed leaving 1685489208 used; max is 6563430400
> >  INFO 15:32:23,564 GC for ConcurrentMarkSweep: 1511 ms, 4296407088
> reclaimed leaving 1685488744 used; max is 6563430400
> >  INFO 15:32:28,068 GC for ConcurrentMarkSweep: 1347 ms, 4295383216
> reclaimed leaving 1686469448 used; max is 6563430400
> >  INFO 15:32:32,617 GC for ConcurrentMarkSweep: 1376 ms, 4295689192
> reclaimed leaving 1687908304 used; max is 6563430400
> >  INFO 15:32:37,283 GC for ConcurrentMarkSweep: 1468 ms, 4296056176
> reclaimed leaving 1687916880 used; max is 6563430400
> >  INFO 15:32:41,811 GC for ConcurrentMarkSweep: 1358 ms, 4296412232
> reclaimed leaving 1688437064 used; max is 6563430400
> >  INFO 15:32:46,436 GC for ConcurrentMarkSweep: 1368 ms, 4296105472
> reclaimed leaving 1691050032 used; max is 6563430400
> >  INFO 15:32:51,180 GC for ConcurrentMarkSweep: 1545 ms, 4297439832
> reclaimed leaving 1691033816 used; max is 6563430400
> >  INFO 15:32:55,703 GC for ConcurrentMarkSweep: 1379 ms, 4295491928
> reclaimed leaving 1692891456 used; max is 6563430400
> >  INFO 15:33:00,328 GC for ConcurrentMarkSweep: 1378 ms, 4296657208
> reclaimed leaving 1694981528 used; max is 6563430400
> 
> Note that those are ConcurrentMarkSweep GC:s rather than ParNew:s, so
> should be running concurrently with the application and should not
> correlate to 1.3 second pauses for the application.

When I have this behaviour (ConcurrentMarkSweep, high CPU...) Cassandra is running but there
is no write, no read since hours... (I stopped read & writes when the behaviour started).

Even after a wipe of data on all nodes, the behaviour started to happen again after some hours
of writing... :-(


> As for the discrepancy between nodes, are all nodes handling a
> similar
> amount of traffic? I briefly checked your original post and you said
> you're doing TimeUUID insertions. I don't remember off hand, and a
> quick google didn't tell me, whether there is something specialy
> about
> the TimeUUID type that would prevent it - but normally if you're
> using
> an OrderedPartitioner you may simply be writing all your data to a
> single node for token space division reasons and the fact that
> timestamps are highly ordered.

Theorically yes. But in fact, this behaviour happens first to heavier nodes (those which have
the more important quantity of data).

> How big a latency are we talking about in the cases where you're
> timing out (i.e., what's the timeout)? Were the timeouts on reads,
> writes or both?

It's TimeOutExceptions on write (using C++ code -> thrift -> cassandra). This cluster
is used at 99% to handle writes.

How could I get/mesure latency ?


Olivier

Mime
View raw message