This comment and some testing were enough for us.

"Generally, a value between 128 and 512 here coupled with a large key cache size on CFs results in the best trade offs.  This value is not often changed, however if you have many very small rows (many to an OS page), then increasing this will often lower memory usage without a impact on performance."

And indeed, I started using this config in only one node without seeing any performance degradation. Mean reads latency was around 4 ms in all the servers, including this one. And I had no more heap full. Heap used now goes from 2.5 GB to 5.5 GB increasing slowly instead of getting stuck around 5.0 GB and 6.5GB (out of 8GB Heap).

All the graph I could see while having both configurations (128/512) on different servers were almost the same, excepted about the Heap.

So 512 was a lot better in our case.

Hope it will help you, since it was also the purpose of this thread.

Alain






2013/7/9 Mike Heffner <mike@librato.com>
I'm curious because we are experimenting with a very similar configuration, what basis did you use for expanding the index_interval to that value? Do you have before and after numbers or was it simply reduction of the heap pressure warnings that you looked for?

thanks,

Mike


On Tue, Jul 9, 2013 at 10:11 AM, Alain RODRIGUEZ <arodrime@gmail.com> wrote:
Hi,

Using C*1.2.2.

We recently dropped our 18 m1.xLarge (4CPU, 15GB RAM, 4 Raid-0 Disks) servers to get 3 hi1.4xLarge (16CPU, 60GB RAM, 2 Raid-0 SSD) servers instead, for about the same price.

We tried it after reading some benchmark published by Netflix.

It is awesome and I recommend it to anyone who is using more than 18 xLarge server or can afford these high cost / high performance EC2 instances. SSD gives a very good throughput with an awesome latency.

Yet, we had about 200 GB data per server and now about 1 TB.

To alleviate memory pressure inside the heap I had to reduce the index sampling. I changed the index_interval value from 128 to 512, with no visible impact on latency, but a great improvement inside the heap which doesn't complain about any pressure anymore.

Is there some more tuning I could use, more tricks that could be useful while using big servers, with a lot of data per node and relatively high throughput ?

SSD are at 20-40 % of their throughput capacity (according to OpsCenter), CPU almost never reach a bigger load than 5 or 6 (with 16 CPU), 15 GB RAM used out of 60GB.

At this point I have kept my previous configuration, which is almost the default one from the Datastax community AMI. There is a part of it, you can consider that any property that is not in here is configured as default :

cassandra.yaml

key_cache_size_in_mb: (empty) - so default - 100MB (hit rate between 88 % and 92 %, good enough ?)
row_cache_size_in_mb: 0 (not usable in our use case, a lot of different and random reads)
flush_largest_memtables_at: 0.80
reduce_cache_sizes_at: 0.90

concurrent_reads: 32 (I am thinking to increase this to 64 or more since I have just a few servers to handle more concurrence)
concurrent_writes: 32 (I am thinking to increase this to 64 or more too)
memtable_total_space_in_mb: 1024 (to avoid having a full heap, shoul I use bigger value, why for ?)

rpc_server_type: sync (I tried hsha and had the "ERROR 12:02:18,971 Read an invalid frame size of 0. Are you using TFramedTransport on the client side?" error). No idea how to fix this, and I use 5 different clients for different purpose  (Hector, Cassie, phpCassa, Astyanax, Helenus)...

multithreaded_compaction: false (Should I try enabling this since I now use SSD ?)
compaction_throughput_mb_per_sec: 16 (I will definitely up this to 32 or even more)

cross_node_timeout: true
endpoint_snitch: Ec2MultiRegionSnitch

index_interval: 512

cassandra-env.sh

I am not sure about how to tune the heap, so I mainly use defaults

MAX_HEAP_SIZE="8G"
HEAP_NEWSIZE="400M" (I tried with higher values, and it produced bigger GC times (1600 ms instead of < 200 ms now with 400M)

-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=70
-XX:+UseCMSInitiatingOccupancyOnly

Does this configuration seems coherent ? Right now, performance are correct, latency < 5ms almost all the time. What can I do to handle more data per node and keep these performances or get even better once ?

I know this is a long message but if you have any comment or insight even on part of it, don't hesitate to share it. I guess this kind of comment on configuration is usable by the entire community.

Alain




--

  Mike Heffner <mike@librato.com>
  Librato, Inc.