incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Doubleday <>
Subject Re: Read Latency Degradation
Date Fri, 17 Dec 2010 17:26:51 GMT
> How much ram is dedicated to cassandra? 12gb heap (probably too high?)
> What is the hit rate of caches? high, 90%+

If your heap allows it I would definitely try to give more ram for fs cache. Your not using
row cache so I don't see what cassandra would gain from so much memory.

A question about your tests:

I assume that they run isolated (you load test one cf at a time) and the results are the same
So the only difference is that one time you are reading from a larger file?

Do you see the same IO load in both tests? Do you use mem-mapped io? And if so are the number
of page faults the same in both tests?

In the end it could just be more physical movements of the disc heads with larger files ...

On Dec 17, 2010, at 5:46 PM, Wayne wrote:

> Below are some answers to your questions. We have wide rows (what we like about Cassandra)
and I wonder if that plays into this? We have been loading 1 keyspace in our cluster heavily
in the last week so it is behind in compaction for that keyspace. I am not even looking at
those read latency times as there are as many as 100+ sstables. Compaction will run tomorrow
for all nodes (weekend is our slow time) and I will test the read latency there. For the keyspace/CFs
that are already well compacted we are seeing a steady increase in read latency as the total
sstable size grows and a linear relationship between our different keyspaces cfs sizes and
the read latency for reads.
> How many nodes? 10 - 16 cores each (2 x quad ht cpus)
> How much ram per node? 24gb
> What disks and how many? SATA 7200rpm 1x1tb for commit log, 4x1tb (raid0) for data 
> Is your ring balanced? yes, random partitioned very evenly
> How many column families? 4 CFs x 3 Keyspaces
> How much ram is dedicated to cassandra? 12gb heap (probably too high?)
> What type of caching are you using? Key caching
> What are the sizes of caches? 500k-1m values for 2 of the CFs
> What is the hit rate of caches? high, 90%+
> What does your disk utiliztion|CPU|Memory look like at peak times? Disk goes to 90%+
under heavy read load. CPU load high as well. Latency does not change that much for single
reads vs. under load (30 threads). We can keep current read latency up to 25-30 read threads
if no writes or compaction is going on. We are worried about what we see in terms of latency
for a single read.
> What are your average mean|max row size from cfstats? 30k avg/5meg max for one CF and
311k avg/855k max for the other.
> On average for a given sstable how large is the data bloom and index files? 30gig data,
189k filter, 5.7meg index for one CF, 98gig data, 587k filter, 18meg index for the other.
> Thanks.
> On Fri, Dec 17, 2010 at 10:58 AM, Edward Capriolo <> wrote:
> On Fri, Dec 17, 2010 at 8:21 AM, Wayne <> wrote:
> > We have been testing Cassandra for 6+ months and now have 10TB in 10 nodes
> > with rf=3. It is 100% real data generated by real code in an almost
> > production level mode. We have gotten past all our stability issues,
> > java/cmf issues, etc. etc. now to find the one thing we "assumed" may not be
> > true. Our current production environment is mysql with extensive
> > partitioning. We have mysql tables with 3-4 billion records and our query
> > performance is the same as with 1 million records (< 100ms).
> >
> > For those of us really trying to manage large volumes of data memory is not
> > an option in any stretch of the imagination. Our current data volume once
> > placed within Cassandra ignoring growth should be around 50 TB. We run
> > manual compaction once a week (absolutely required to keep ss table counts
> > down) and it is taking a very long amount of time. Now that our nodes are
> > past 1TB I am worried it will take more than a day. I was hoping everyone
> > would respond to my posting with something must be wrong, but instead I am
> > hearing you are off the charts good luck and be patient. Scary to say the
> > least given our current investment in Cassandra. Is it true/expected that
> > read latency will get worse in a linear fashion as the ss table size grows?
> >
> > Can anyone talk me off the fence here? We have 9 MySQL servers that now
> > serve up 15+TB of data. Based on what we have seen we need 100 Cassandra
> > nodes with rf=3 to give us good read latency (by keeping the node data sizes
> > down). The cost/value equation just does not add up.
> >
> > Thanks in advance for any advice/experience you can provide.
> >
> >
> > On Fri, Dec 17, 2010 at 5:07 AM, Daniel Doubleday <>
> > wrote:
> >>
> >> On Dec 16, 2010, at 11:35 PM, Wayne wrote:
> >>
> >> > I have read that read latency goes up with the total data size, but to
> >> > what degree should we expect a degradation in performance? What is the
> >> > "normal" read latency range if there is such a thing for a small slice
> >> > scol/cols? Can we really put 2TB of data on a node and get good read latency
> >> > querying data off of a handful of CFs? Any experience or explanations would
> >> > be greatly appreciated.
> >>
> >> If you really mean 2TB per node I strongly advise you to perform thorough
> >> testing with real world column sizes and the read write load you expect. Try
> >> to load test at least with a test cluster / data that represents one
> >> replication group. I.e. RF=3 -> 3 nodes. And test with the consistency level
> >> you want to use. Also test ring operations (repair, adding nodes, moving
> >> nodes) while under expected load/
> >>
> >> Combined with 'a handful of CFs' I would assume that you are expecting a
> >> considerable write load. You will get massive compaction load and with that
> >> data size the file system cache will suffer big time. You'll need loads of
> >> RAM and still ...
> >>
> >> I can only speak about 0.6 but ring management operations will become a
> >> nightmare and you will have very long running repairs.
> >>
> >> The cluster behavior changes massively with different access patterns
> >> (cold vs warm data) and data sizes. So you have to understand yours and test
> >> it. I think most generic load tests are mainly marketing instruments and I
> >> believe this is especially true for cassandra.
> >>
> >> Don't want to sound negative (I am a believer and don't regret our
> >> investment) but cassandra is no silver bullet. You really need to know what
> >> you are doing.
> >>
> >> Cheers,
> >> Daniel
> >
> Yes major compactions for large sets of data do take a long time
> (360GB takes me about 6 hours).
> You said "needing to compact to keep the sstable count low". This is
> not a good sign. My sstable counts sawtooth between 8-15 per CF
> through the day. If you are in a scenario where the SSTables are
> growing all day and only catch up at night, and you have tuned
> memtables, then your need more nodes likely. This means that your
> cluster can not really keep up with your write traffic. You know
> cassandra can take bursts of writes well, but if you are at the case
> where your sstables count is getting higher you are essentially
> failing behind. (You may not need 100 nodes like you are suggesting
> but possibly a few to get you over the fence.)
> I do run major compactions at night, but not on every night on every
> node. I do one a node a night to make sure these are splayed out over
> the week, With deletes on non-major compactions you may not need to do
> this, but we add and remove a lot of data per day so I find I have
> to/should. Since the nights are quite for us anyway.
> As for how many nodes you need...What works out better ?
> Big Iron: 1x (2 TB 64 GB RAM ) cost ? power ? Rack size ?
> Small factor: 4x (500GB  16GB RAM) cost ? power ? Rack Size ?
> Generally I think most are running the "small factor" type deployment,
> and generally this works better by avoiding 2GB compactions!
> Is it true that read latency grows linearly with sstable size? No (but
> it could be true in your case).
> As for your specific problems. More info is needed.
> How many nodes?
> How much ram per node?
> What disks and how many?
> Is your ring balanced?
> How many column families?
> How much ram is dedicated to cassandra?
> What type of caching are you using?
> What are the sizes of caches?
> What is the hit rate of caches?
> What does your disk utiliztion|CPU|Memory look like at peak times?
> What are your average mean|max row size from cfstats
> On average for a given sstable how large is the data bloom and index files?

View raw message