incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne <wav...@gmail.com>
Subject Re: Cassandra performance
Date Wed, 15 Sep 2010 16:06:28 GMT
If MySQL is faster then use it. I struggled to do side by side comparisons
with Mysql for months until finally realizing they are too different to do
side by side comparisons. Mysql is always faster out of the gate when you
come at the problem thinking in terms of relational databases. Add in
replication factor, using wider rows, dealing with databases that are 2-3
terabytes, tables with 3+ billions rows, etc. etc. The nosql "noise" out
there should be ignored, and a solution like cassandra should be evaluated
for what it brings to the table in terms of a technology that can solve the
problems of big data and not how it does individual queries relative to
mysql. If a "normal" database works for you use it!!

We have tested real loads using a 6 node cluster and consistently get 5ms
reads under load. That is 200 reads/second (1 thread). Mysql is 10x faster,
but then we also have wide rows and in that 5ms get 6 months of lots of
different time series data which in the end means it is 10x faster than
Mysql (1 thread). By embracing wide rows we turn slower into faster. Add in
multiple threads/processes and the ability for a 20 node cluster to support
concurrent reads and Mysql falls back in the dust. Also we don't have 300gb
compressed backup files, we can easily add new nodes and grow, we can
actually add columns dynamically without the dreaded ddl deadlock nightmare
in mysql, and for once we have replication that just works.


On Wed, Sep 15, 2010 at 2:39 AM, Oleg Anastasyev <oleganas@gmail.com> wrote:

> Kamil Gorlo <kgs4242 <at> gmail.com> writes:
>
> >
> > So I've got more reads from single MySQL with 400GB of data than from
> > 8 machines storing about 266GB. This doesn't look good. What am I
> > doing wrong? :)
>
> The worst case for cassandra is random reads. You should ask youself a
> question,
> do you really have this kind of workload in production ? If you really do,
> that
> means cassandra is not the right tool for the job. Some product based on
> berkeley db should work better, e.g. voldemort. Just plain old filesystem
> is
> also good for 100% random reads (if you dont need to backup of course).
>
>

Mime
View raw message