incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Read Latency Degradation
Date Fri, 17 Dec 2010 15:58:17 GMT
On Fri, Dec 17, 2010 at 8:21 AM, Wayne <wav100@gmail.com> wrote:
> We have been testing Cassandra for 6+ months and now have 10TB in 10 nodes
> with rf=3. It is 100% real data generated by real code in an almost
> production level mode. We have gotten past all our stability issues,
> java/cmf issues, etc. etc. now to find the one thing we "assumed" may not be
> true. Our current production environment is mysql with extensive
> partitioning. We have mysql tables with 3-4 billion records and our query
> performance is the same as with 1 million records (< 100ms).
>
> For those of us really trying to manage large volumes of data memory is not
> an option in any stretch of the imagination. Our current data volume once
> placed within Cassandra ignoring growth should be around 50 TB. We run
> manual compaction once a week (absolutely required to keep ss table counts
> down) and it is taking a very long amount of time. Now that our nodes are
> past 1TB I am worried it will take more than a day. I was hoping everyone
> would respond to my posting with something must be wrong, but instead I am
> hearing you are off the charts good luck and be patient. Scary to say the
> least given our current investment in Cassandra. Is it true/expected that
> read latency will get worse in a linear fashion as the ss table size grows?
>
> Can anyone talk me off the fence here? We have 9 MySQL servers that now
> serve up 15+TB of data. Based on what we have seen we need 100 Cassandra
> nodes with rf=3 to give us good read latency (by keeping the node data sizes
> down). The cost/value equation just does not add up.
>
> Thanks in advance for any advice/experience you can provide.
>
>
> On Fri, Dec 17, 2010 at 5:07 AM, Daniel Doubleday <daniel.doubleday@gmx.net>
> wrote:
>>
>> On Dec 16, 2010, at 11:35 PM, Wayne wrote:
>>
>> > I have read that read latency goes up with the total data size, but to
>> > what degree should we expect a degradation in performance? What is the
>> > "normal" read latency range if there is such a thing for a small slice of
>> > scol/cols? Can we really put 2TB of data on a node and get good read latency
>> > querying data off of a handful of CFs? Any experience or explanations would
>> > be greatly appreciated.
>>
>> If you really mean 2TB per node I strongly advise you to perform thorough
>> testing with real world column sizes and the read write load you expect. Try
>> to load test at least with a test cluster / data that represents one
>> replication group. I.e. RF=3 -> 3 nodes. And test with the consistency level
>> you want to use. Also test ring operations (repair, adding nodes, moving
>> nodes) while under expected load/
>>
>> Combined with 'a handful of CFs' I would assume that you are expecting a
>> considerable write load. You will get massive compaction load and with that
>> data size the file system cache will suffer big time. You'll need loads of
>> RAM and still ...
>>
>> I can only speak about 0.6 but ring management operations will become a
>> nightmare and you will have very long running repairs.
>>
>> The cluster behavior changes massively with different access patterns
>> (cold vs warm data) and data sizes. So you have to understand yours and test
>> it. I think most generic load tests are mainly marketing instruments and I
>> believe this is especially true for cassandra.
>>
>> Don't want to sound negative (I am a believer and don't regret our
>> investment) but cassandra is no silver bullet. You really need to know what
>> you are doing.
>>
>> Cheers,
>> Daniel
>

Yes major compactions for large sets of data do take a long time
(360GB takes me about 6 hours).

You said "needing to compact to keep the sstable count low". This is
not a good sign. My sstable counts sawtooth between 8-15 per CF
through the day. If you are in a scenario where the SSTables are
growing all day and only catch up at night, and you have tuned
memtables, then your need more nodes likely. This means that your
cluster can not really keep up with your write traffic. You know
cassandra can take bursts of writes well, but if you are at the case
where your sstables count is getting higher you are essentially
failing behind. (You may not need 100 nodes like you are suggesting
but possibly a few to get you over the fence.)

I do run major compactions at night, but not on every night on every
node. I do one a node a night to make sure these are splayed out over
the week, With deletes on non-major compactions you may not need to do
this, but we add and remove a lot of data per day so I find I have
to/should. Since the nights are quite for us anyway.

As for how many nodes you need...What works out better ?
Big Iron: 1x (2 TB 64 GB RAM ) cost ? power ? Rack size ?
Small factor: 4x (500GB  16GB RAM) cost ? power ? Rack Size ?
Generally I think most are running the "small factor" type deployment,
and generally this works better by avoiding 2GB compactions!

Is it true that read latency grows linearly with sstable size? No (but
it could be true in your case).

As for your specific problems. More info is needed.

How many nodes?
How much ram per node?
What disks and how many?
Is your ring balanced?
How many column families?
How much ram is dedicated to cassandra?
What type of caching are you using?
What are the sizes of caches?
What is the hit rate of caches?
What does your disk utiliztion|CPU|Memory look like at peak times?
What are your average mean|max row size from cfstats
On average for a given sstable how large is the data bloom and index files?

Mime
View raw message