incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: memory issue on 1.1.0
Date Wed, 06 Jun 2012 08:39:42 GMT
Mina, 
	That does not sound right. 

	If you have the time can you create a jira ticket describing the problem, please include:

* the GC logs gathered by enabling them here https://github.com/apache/cassandra/blob/trunk/conf/cassandra-env.sh#L165
(It would be good to see the node get into trouble if possible).
* OS, JVM and cassandra versions
* information on the schema and workload
* anything else you think is important. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 7:24 AM, Mina Naguib wrote:

> 
> Hi Wade
> 
> I don't know if your scenario matches mine, but I've been struggling with memory pressure
in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, along with enabling compression and
levelled compactions, so I don't know which specifically is the main culprit.
> 
> Specifically, all my nodes seem to "lose" heap memory.  As parnew and CMS do their job,
over any reasonable period of time, the "floor" of memory after a GC keeps rising.  This is
quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking
cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png
> 
> Once memory pressure reaches a point where the heap can't be maintained reliably below
75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do
things like flush memtables, drop caches, etc - all of which, in my experience, especially
with the recent off-heap data structures, exasperate the problem.
> 
> I've been meaning, of course, to collect enough technical data to file a bug report,
but haven't had the time.  I have not yet tested 1.1.1 to see if it improves the situation.
> 
> What I have found however, is a band-aid which you see at the rightmost section of the
graph in the screenshot I posted.  That is simply to hit "Perform GC" button in jconsole.
 It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim.
> 
> On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every
4 hours.  It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary
evil in my situation.  Without it, my nodes enter a nasty spiral of constant flushing, constant
compactions, high heap usage, instability and high latency.
> 
> 
> On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote:
> 
>> Alas, upgrading to 1.1.1 did not solve my issue.
>> 
>> -----Original Message-----
>> From: Brandon Williams [mailto:driftx@gmail.com] 
>> Sent: Monday, June 04, 2012 11:24 PM
>> To: user@cassandra.apache.org
>> Subject: Re: memory issue on 1.1.0
>> 
>> Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741
>> 
>> -Brandon
>> 
>> On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L <wade.l.poziombka@intel.com>
wrote:
>>> Running a very write intensive (new column, delete old column etc.) process and
failing on memory.  Log file attached.
>>> 
>>> Curiously when I add new data I have never seen this have in past sent 
>>> hundreds of millions "new" transactions.  It seems to be when I 
>>> modify.  my process is as follows
>>> 
>>> key slice to get columns to modify in batches of 100, in separate threads modify
those columns.  I advance the slice with the start key each with last key in previous batch.
 Mutations done are update a column value in one column family(token), delete column and add
new column in another (pan).
>>> 
>>> Runs well until after about 5 million rows then it seems to run out of memory.
 Note that these column families are quite small.
>>> 
>>> WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 
>>> 145) Heap is 0.7967470834946492 full.  You may need to reduce memtable 
>>> and/or cache sizes.  Cassandra will now flush up to the two largest 
>>> memtables to free up memory.  Adjust flush_largest_memtables_at 
>>> threshold in cassandra.yaml if you don't want Cassandra to do this 
>>> automatically
>>> INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java 
>>> (line 2772) Unable to reduce heap usage since there are no dirty 
>>> column families
>>> INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
>>> InetAddress /10.230.34.170 is now UP
>>> INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java 
>>> (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; 
>>> max is 8506048512
>>> INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java 
>>> (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 
>>> 5714800208 used; max is 8506048512
>>> 
>>> ----------------
>>> Keyspace: keyspace
>>>       Read Count: 50042632
>>>       Read Latency: 0.23157864418482224 ms.
>>>       Write Count: 44948323
>>>       Write Latency: 0.019460829472992797 ms.
>>>       Pending Tasks: 0
>>>               Column Family: pan
>>>               SSTable count: 5
>>>               Space used (live): 1977467326
>>>               Space used (total): 1977467326
>>>               Number of Keys (estimate): 16334848
>>>               Memtable Columns Count: 0
>>>               Memtable Data Size: 0
>>>               Memtable Switch Count: 74
>>>               Read Count: 14985122
>>>               Read Latency: 0.408 ms.
>>>               Write Count: 19972441
>>>               Write Latency: 0.022 ms.
>>>               Pending Tasks: 0
>>>               Bloom Filter False Postives: 829
>>>               Bloom Filter False Ratio: 0.00073
>>>               Bloom Filter Space Used: 37048400
>>>               Compacted row minimum size: 125
>>>               Compacted row maximum size: 149
>>>               Compacted row mean size: 149
>>> 
>>>               Column Family: token
>>>               SSTable count: 4
>>>               Space used (live): 1250973873
>>>               Space used (total): 1250973873
>>>               Number of Keys (estimate): 14217216
>>>               Memtable Columns Count: 0
>>>               Memtable Data Size: 0
>>>               Memtable Switch Count: 49
>>>               Read Count: 30059563
>>>               Read Latency: 0.167 ms.
>>>               Write Count: 14985488
>>>               Write Latency: 0.014 ms.
>>>               Pending Tasks: 0
>>>               Bloom Filter False Postives: 13642
>>>               Bloom Filter False Ratio: 0.00322
>>>               Bloom Filter Space Used: 28002984
>>>               Compacted row minimum size: 150
>>>               Compacted row maximum size: 258
>>>               Compacted row mean size: 224
>>> 
>>>               Column Family: counters
>>>               SSTable count: 2
>>>               Space used (live): 561549994
>>>               Space used (total): 561549994
>>>               Number of Keys (estimate): 9985024
>>>               Memtable Columns Count: 0
>>>               Memtable Data Size: 0
>>>               Memtable Switch Count: 38
>>>               Read Count: 4997947
>>>               Read Latency: 0.092 ms.
>>>               Write Count: 9990394
>>>               Write Latency: 0.023 ms.
>>>               Pending Tasks: 0
>>>               Bloom Filter False Postives: 191
>>>               Bloom Filter False Ratio: 0.37525
>>>               Bloom Filter Space Used: 18741152
>>>               Compacted row minimum size: 125
>>>               Compacted row maximum size: 179
>>>               Compacted row mean size: 150
>>> 
>>> ----------------
> 


Mime
View raw message