Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of jeremy.hanna1234@gmail.com
 designates 209.85.160.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=from:mime-version:content-type:subject:date:in-reply-to:to
         :references:message-id:x-mailer;
        b=Fr1xdMAkFK5Pc/lycGOgFlyIXQS384H7v5txms9/E9xfeD38cu2LjRQQnK0Cr2vm/K
         C5qN7X4NlksB9lezZPBNlPtJ6oZdOl5l1+U/25kr3w4BU/YxyfCyO3dAM6RmEY25d4jh
         LulE8tdV4/gVzgCaD7llLnTI2qjbKWItQPBAU=
From: Jeremy Hanna <jeremy.hanna1234@gmail.com>
Mime-Version: 1.0 (Apple Message framework v1081)
Content-Type: multipart/alternative; boundary=Apple-Mail-8--636751038
Subject: Re: Cassandra performance
Date: Fri, 17 Sep 2010 16:56:50 -0500
In-Reply-To: <3FBC215A-EB09-4740-8A38-5D2328F3AF6D@voxeo.com>
To: user@cassandra.apache.org
References: <AANLkTineeKgDr_EuBgM3CX5PE=GY0Ri8NgwqPN0iHM=U@mail.gmail.com>
 <loom.20100915T083259-202@post.gmane.org>
 <AANLkTik-T2mL+X50EGqmcM51GwbDO2ukBoVmrk+Sip-Q@mail.gmail.com>
 <3FBC215A-EB09-4740-8A38-5D2328F3AF6D@voxeo.com>
Message-Id: <6CEEB726-1553-4E6F-9EC2-4212BE046984@gmail.com>


--Apple-Mail-8--636751038
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

h=
ttp://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures=20=


On Sep 17, 2010, at 4:35 PM, Zhong Li wrote:

> This is my personal experiences. MySQL is faster than Cassandra on =
most normal use cases. =20
>=20
> You should understand why you choose Cassandra instead of MySQL. If =
one central MySQL can handle your workload, MySQL is better than =
Cassandra. BUT if you are overload one MySQL and want multiple boxes, =
Cassandra can be a solution for cheap, Cassandra  provides fault =
tolerant, decentralized, durable and rich data model. It will not =
provide your high performance, especially reading  performance is poor.=20=

>=20
> Digg failed to use Cassandra. You can check
> http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-door/
>=20
> This doesn't mean Cassandra is bad. You need design carefully to use =
Cassandra for your application and business model for success.
>=20
>=20
>  =20
> On Sep 15, 2010, at 12:06 PM, Wayne wrote:
>=20
>> If MySQL is faster then use it. I struggled to do side by side =
comparisons with Mysql for months until finally realizing they are too =
different to do side by side comparisons. Mysql is always faster out of =
the gate when you come at the problem thinking in terms of relational =
databases. Add in replication factor, using wider rows, dealing with =
databases that are 2-3 terabytes, tables with 3+ billions rows, etc. =
etc. The nosql "noise" out there should be ignored, and a solution like =
cassandra should be evaluated for what it brings to the table in terms =
of a technology that can solve the problems of big data and not how it =
does individual queries relative to mysql. If a "normal" database works =
for you use it!!
>>=20
>> We have tested real loads using a 6 node cluster and consistently get =
5ms reads under load. That is 200 reads/second (1 thread). Mysql is 10x =
faster, but then we also have wide rows and in that 5ms get 6 months of =
lots of different time series data which in the end means it is 10x =
faster than Mysql (1 thread). By embracing wide rows we turn slower into =
faster. Add in multiple threads/processes and the ability for a 20 node =
cluster to support concurrent reads and Mysql falls back in the dust. =
Also we don't have 300gb compressed backup files, we can easily add new =
nodes and grow, we can actually add columns dynamically without the =
dreaded ddl deadlock nightmare in mysql, and for once we have =
replication that just works.
>>=20
>>=20
>> On Wed, Sep 15, 2010 at 2:39 AM, Oleg Anastasyev <oleganas@gmail.com> =
wrote:
>> Kamil Gorlo <kgs4242 <at> gmail.com> writes:
>>=20
>> >
>> > So I've got more reads from single MySQL with 400GB of data than =
from
>> > 8 machines storing about 266GB. This doesn't look good. What am I
>> > doing wrong? :)
>>=20
>> The worst case for cassandra is random reads. You should ask youself =
a question,
>> do you really have this kind of workload in production ? If you =
really do, that
>> means cassandra is not the right tool for the job. Some product based =
on
>> berkeley db should work better, e.g. voldemort. Just plain old =
filesystem is
>> also good for 100% random reads (if you dont need to backup of =
course).
>>=20
>>=20
>=20


--Apple-Mail-8--636751038
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><div><a =
href=3D"http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-=
failures">http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technica=
l-failures</a>&nbsp;<div><div><div><div><br><div><div>On Sep 17, 2010, =
at 4:35 PM, Zhong Li wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite"><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; ">This is my personal =
experiences. MySQL is faster than Cassandra on most normal use cases. =
&nbsp;<div><br><div>You should understand why you choose Cassandra =
instead of MySQL. If one central MySQL can handle your workload, MySQL =
is better than&nbsp;Cassandra. BUT if you are overload one MySQL and =
want multiple boxes,&nbsp;Cassandra can be a solution for cheap, =
Cassandra &nbsp;provides fault tolerant, decentralized, durable and rich =
data model. It will not provide your high performance, especially =
reading &nbsp;performance is poor.&nbsp;</div><div><br></div><div>Digg =
failed to use&nbsp;Cassandra. You can check</div><div><span =
class=3D"Apple-style-span" style=3D"font-size: 12px; "><a =
href=3D"http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-doo=
r/">http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-door/</=
a></span></div><div><br></div><div>This doesn't mean Cassandra is bad. =
You need design carefully to use&nbsp;Cassandra for your application and =
business model for =
success.</div><div><br></div><div><br></div><div>&nbsp;&nbsp;<br><div><div=
>On Sep 15, 2010, at 12:06 PM, Wayne wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite">If MySQL =
is faster then use it. I struggled to do side by side comparisons with =
Mysql for months until finally realizing they are too different to do =
side by side comparisons. Mysql is always faster out of the gate when =
you come at the problem thinking in terms of relational databases. Add =
in replication factor, using wider rows, dealing with databases that are =
2-3 terabytes, tables with 3+ billions rows, etc. etc. The nosql "noise" =
out there should be ignored, and a solution like cassandra should be =
evaluated for what it brings to the table in terms of a technology that =
can solve the problems of big data and not how it does individual =
queries relative to mysql. If a "normal" database works for you use =
it!!<br> <br>We have tested real loads using a 6 node cluster and =
consistently get 5ms reads under load. That is 200 reads/second (1 =
thread). Mysql is 10x faster, but then we also have wide rows and in =
that 5ms get 6 months of lots of different time series data which in the =
end means it is 10x faster than Mysql (1 thread). By embracing wide rows =
we turn slower into faster. Add in multiple threads/processes and the =
ability for a 20 node cluster to support concurrent reads and Mysql =
falls back in the dust. Also we don't have 300gb compressed backup =
files, we can easily add new nodes and grow, we can actually add columns =
dynamically without the dreaded ddl deadlock nightmare in mysql, and for =
once we have replication that just works.<br> <br><br><div =
class=3D"gmail_quote">On Wed, Sep 15, 2010 at 2:39 AM, Oleg Anastasyev =
<span dir=3D"ltr">&lt;<a =
href=3D"mailto:oleganas@gmail.com">oleganas@gmail.com</a>&gt;</span> =
wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin: 0pt 0pt 0pt =
0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"> =
<div class=3D"im">Kamil Gorlo &lt;kgs4242 &lt;at&gt; <a =
href=3D"http://gmail.com/" target=3D"_blank">gmail.com</a>&gt; =
writes:<br> <br> &gt;<br> &gt; So I've got more reads from single MySQL =
with 400GB of data than from<br> &gt; 8 machines storing about 266GB. =
This doesn't look good. What am I<br> &gt; doing wrong? :)<br> <br> =
</div>The worst case for cassandra is random reads. You should ask =
youself a question,<br> do you really have this kind of workload in =
production ? If you really do, that<br> means cassandra is not the right =
tool for the job. Some product based on<br> berkeley db should work =
better, e.g. voldemort. Just plain old filesystem is<br> also good for =
100% random reads (if you dont need to backup of course).<br> <br> =
</blockquote></div><br></blockquote></div><br></div></div></div></blockquo=
te></div><br></div></div></div></div></div></body></html>=

--Apple-Mail-8--636751038--