incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill de hÓra <b...@dehora.net>
Subject Re: Using Cassandra for read operations
Date Thu, 21 Feb 2013 20:06:55 GMT
> To avoid disk I/Os, the best option we thought is to have data in memory. 
> Is it a good idea to have memtable setup around 1/2 or 3/4 of 
> heap size? Obviously flushing will take a lot of time but would 
> that hurt that node's performance big time?

Start with the defaults and test your workload. If memtables start flushing aggressively (because
of write load or bad settings), that can cause compaction work on the disk, that might impair
read I/O. 


>  Is there a way to figure out max read-latency for a bunch of read operations?

Use nodetool's histogram feature to get a sense of outlier latency.


> We just need one column family with a long key

Take time to tune your key caches and bloom filters. They use memory and have an impact on
read performance.


> Given that cassandra provides off-heap row caching, in a 
> machine >32 gb RAM, would it be wise to have a >10 gb row 
> cache with 8 gb java heap? 

If you use the off heap cache, allow enough room for the filesystems' own cache, i.e. don't
give over all of ram to the off heap cache. Also the off heap cache can slow you down with
wide rows due to serialisation overhead, or cache invalidation thrashing if you are update
heavy. if you use the on-heap cache, pay close attention to GC cycles and memory stability
- if you are cycling/evicting through the cache at a high rate that can leave too much garbage
in memory such that the garbage collector can't keep up. If the node doesn't have enough working
memory after GC, it will _resize_ key and row caches. This will lead to degraded read performance
and with some workloads can result in a vicious cycle.


>  For our SLAs, a read of max 15-20 rows at once(using multi slice), 
> should not take more than 4 ms.

If you control your own hardware (and you probably should/must for this kind of latency demand)
consider SSDs. You might want to carefully control background repair/compaction operations
if predictable performance is your goal. You might want to avoid storing strings and use byte
representations. If you have an application tier on the path consider caching in that tier
as well to avoid the overhead of network calls and thrift processing.

In a nutshell -

- Start with defaults and tune based on small discrete adjustments and leave time to see the
effect of each change. No-one will know your workload better than you and the questions you
are asking are workload sensitive.

- Allow time for tuning and spending time understanding the memory model and JVM GC.

- Be very careful with caches. Leave enough room in the OS for its own disk cache.

- Get an SSD


Bill


On 21 Feb 2013, at 19:03, amulya rattan <talk2amulya@gmail.com> wrote:

> Dear All,
> 
> We are currently evaluating Cassandra for an application involving strict SLAs(Service
level agreements). We just need one column family with a long key and approximately 70-80
bytes row. We are not concerned about write performance but are primarily concerned about
read. For our SLAs, a read of max 15-20 rows at once(using multi slice), should not take more
than 4 ms. Till now, on a single node setup, using cassandra' stress tool, the numbers are
promising. But I am guessing that's because there is no network latency involved there and
since we set memtable around 2gb(4 gb heap), we never had to get to Disk I/O.
> 
> Assuming our nodes having >32GB RAM, a couple of questions regarding read:
> 
> * To avoid disk I/Os, the best option we thought is to have data in memory. Is it a good
idea to have memtable setup around 1/2 or 3/4 of heap size? Obviously flushing will take a
lot of time but would that hurt that node's performance big time?
> 
> * Cassandra stress tool only gives out average read latency. Is there a way to figure
out max read-latency for a bunch of read operations?
> 
> * How big a row cache can one have? Given that cassandra provides off-heap row caching,
in a machine >32 gb RAM, would it be wise to have a >10 gb row cache with 8 gb java
heap? And how big should the corresponding key cache be then?
> 
> Any response is appreciated.
> 
> ~Amulya 
> 


Mime
View raw message