incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Hanna <jeremy.hanna1...@gmail.com>
Subject Re: Cassandra performance
Date Fri, 17 Sep 2010 21:56:50 GMT
http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures 

On Sep 17, 2010, at 4:35 PM, Zhong Li wrote:

> This is my personal experiences. MySQL is faster than Cassandra on most normal use cases.
 
> 
> You should understand why you choose Cassandra instead of MySQL. If one central MySQL
can handle your workload, MySQL is better than Cassandra. BUT if you are overload one MySQL
and want multiple boxes, Cassandra can be a solution for cheap, Cassandra  provides fault
tolerant, decentralized, durable and rich data model. It will not provide your high performance,
especially reading  performance is poor. 
> 
> Digg failed to use Cassandra. You can check
> http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-door/
> 
> This doesn't mean Cassandra is bad. You need design carefully to use Cassandra for your
application and business model for success.
> 
> 
>   
> On Sep 15, 2010, at 12:06 PM, Wayne wrote:
> 
>> If MySQL is faster then use it. I struggled to do side by side comparisons with Mysql
for months until finally realizing they are too different to do side by side comparisons.
Mysql is always faster out of the gate when you come at the problem thinking in terms of relational
databases. Add in replication factor, using wider rows, dealing with databases that are 2-3
terabytes, tables with 3+ billions rows, etc. etc. The nosql "noise" out there should be ignored,
and a solution like cassandra should be evaluated for what it brings to the table in terms
of a technology that can solve the problems of big data and not how it does individual queries
relative to mysql. If a "normal" database works for you use it!!
>> 
>> We have tested real loads using a 6 node cluster and consistently get 5ms reads under
load. That is 200 reads/second (1 thread). Mysql is 10x faster, but then we also have wide
rows and in that 5ms get 6 months of lots of different time series data which in the end means
it is 10x faster than Mysql (1 thread). By embracing wide rows we turn slower into faster.
Add in multiple threads/processes and the ability for a 20 node cluster to support concurrent
reads and Mysql falls back in the dust. Also we don't have 300gb compressed backup files,
we can easily add new nodes and grow, we can actually add columns dynamically without the
dreaded ddl deadlock nightmare in mysql, and for once we have replication that just works.
>> 
>> 
>> On Wed, Sep 15, 2010 at 2:39 AM, Oleg Anastasyev <oleganas@gmail.com> wrote:
>> Kamil Gorlo <kgs4242 <at> gmail.com> writes:
>> 
>> >
>> > So I've got more reads from single MySQL with 400GB of data than from
>> > 8 machines storing about 266GB. This doesn't look good. What am I
>> > doing wrong? :)
>> 
>> The worst case for cassandra is random reads. You should ask youself a question,
>> do you really have this kind of workload in production ? If you really do, that
>> means cassandra is not the right tool for the job. Some product based on
>> berkeley db should work better, e.g. voldemort. Just plain old filesystem is
>> also good for 100% random reads (if you dont need to backup of course).
>> 
>> 
> 


Mime
View raw message