I know this subject has been discussed in the past on the list and I've read through all discussions but I haven't been able to find a solution to the memory problems listed below... so here again...

It seems that the cassandra cluster I'm using is either leaking memory or just using more mem than I expected it to use.
Each host in the ring uses about 12G of ram while in some cases its entire dataset is only 1.5G (take for example .252.124 below with 1.54G)
I use extensive row caching so I expect memory consumption to be >= 1.5G but I don't understand why it gets up to 12G. Most of the times I don't care so much since I have plenty of memory however at times this gets me into GC storms and very slow responses. Also, I'd like to be able to load more data to the cluster and I'm hitting the memory wall, which I didn't expect.

In the cassandra.in.sh you'd notice that I do provide Xmx=12G but given that there's so little data I wouldn't expect the process to be using all of that. As a matter of fact I wanted to insert more data to the cluster but I stopped since it wasn't handling the load very well. 

I suppose that at the end of the day I only need to know which knobs configure but after having played with the configuration for a long time I'm a little lost.

I'm running a 0.6.2 cluster consisting of 6 physical hosts (some with 16G and some 32G ram) distributed b/w two DCs. 
RF is 2 (one replica in each DC).
HH is turned off.
File access is standard (no m-mapped files, I tried that and the system just kept swapping itself to death so I switched back to normal).

I've pasted below the output of nodetool ring and cfstats as well as some vmstat and iostat (not that I think it matters...)
Also jmap -heap and attached is the jmap -histo so I hope this output can help shed some light on memory usage.
Currently the logs don't say anything out of the ordinary so I didn't include them. 

Thanks :)

$ nodetool -h cass99 -p 9004 ring
Address       Status     Load          Range                                      Ring
                                       170141183460469231731687303715884105727         6.16 GB       28356863910078205288614550619314017621     |<--|         1.54 GB       56713727820156410577229101238628035242     |   ^         1.54 GB       85070591730234615865843651857942052863     v   |         6.15 GB       113427455640312821154458202477256070485    |   ^         1.54 GB       141784319550391026443072753096570088106    v   |         1.54 GB       170141183460469231731687303715884105727    |-->|

 <Keyspace Name="outbrain_kvdb">
      <ColumnFamily CompareWith="BytesType" Name="KvImpressions"
      <ColumnFamily CompareWith="BytesType" Name="KvAds"
      <ColumnFamily CompareWith="BytesType" Name="KvRatings"

$ cat bin/cassandra.in.sh 
# Licensed to the Apache Software Foundation (ASF) under one
# Arguments to pass to the JVM
        -ea \
        -Xms4G \
        -Xmx12G \
        -XX:+UseParNewGC \
        -XX:+UseConcMarkSweepGC \
        -XX:+CMSParallelRemarkEnabled \
        -XX:SurvivorRatio=8 \
        -XX:MaxTenuringThreshold=1 \
        -XX:+HeapDumpOnOutOfMemoryError \
        -Dcom.sun.management.jmxremote.port=9004 \
        -Dcom.sun.management.jmxremote.ssl=false \

Keyspace: outbrain_kvdb
        Read Count: 5608010
        Read Latency: 8.52211627029909 ms.
        Write Count: 42794
        Write Latency: 0.10353956162078796 ms.
        Pending Tasks: 0
                Column Family: KvAds
                SSTable count: 11
                Space used (live): 9331647391
                Space used (total): 9331647391
                Memtable Columns Count: 84928
                Memtable Data Size: 21400502
                Memtable Switch Count: 1
                Read Count: 5602705
                Read Latency: 2.023 ms.
                Write Count: 42794
                Write Latency: 0.060 ms.
                Pending Tasks: 0
                Key cache: disabled
                Row cache capacity: 10000000
                Row cache size: 698671
                Row cache hit rate: 0.5535463700149053
                Compacted row minimum size: 391
                Compacted row maximum size: 76890
                Compacted row mean size: 635

top - 10:23:26 up 96 days, 23:04,  1 user,  load average: 5.03, 6.21, 6.08
Tasks:  93 total,   1 running,  92 sleeping,   0 stopped,   0 zombie
Cpu(s): 92.1%us,  4.1%sy,  0.0%ni,  1.8%id,  0.0%wa,  0.5%hi,  1.5%si,  0.0%st
Mem:  16443880k total, 16357676k used,    86204k free,    43448k buffers
Swap:  4194296k total,    13912k used,  4180384k free,  2625024k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                       
 5757 cassandr  25   0 13.6g  12g 9860 S 197.2 82.3   9445:17 java                     

$ jmap -heap 5757
Attaching to process ID 5757, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 16.3-b01

using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 12884901888 (12288.0MB)
   NewSize          = 21757952 (20.75MB)
   MaxNewSize       = 43581440 (41.5625MB)
   OldSize          = 65404928 (62.375MB)
   NewRatio         = 7
   SurvivorRatio    = 8
   PermSize         = 21757952 (20.75MB)
   MaxPermSize      = 88080384 (84.0MB)

Heap Usage:
New Generation (Eden + 1 Survivor Space):
   capacity = 39256064 (37.4375MB)
   used     = 6779480 (6.465415954589844MB)
   free     = 32476584 (30.972084045410156MB)
   17.26989236618322% used
Eden Space:
   capacity = 34930688 (33.3125MB)
   used     = 2490360 (2.3749923706054688MB)
   free     = 32440328 (30.93750762939453MB)
   7.12943300744606% used
From Space:
   capacity = 4325376 (4.125MB)
   used     = 4289120 (4.090423583984375MB)
   free     = 36256 (0.034576416015625MB)
   99.16178385416667% used
To Space:
   capacity = 4325376 (4.125MB)
   used     = 0 (0.0MB)
   free     = 4325376 (4.125MB)
   0.0% used
concurrent mark-sweep generation:
   capacity = 12841320448 (12246.4375MB)
   used     = 10867324872 (10363.888618469238MB)
   free     = 1973995576 (1882.5488815307617MB)
   84.62778353679785% used
Perm Generation:
   capacity = 30380032 (28.97265625MB)
   used     = 18100520 (17.262001037597656MB)
   free     = 12279512 (11.710655212402344MB)
   59.580319072738305% used