Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 33318 invoked from network); 17 Sep 2010 21:57:23 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Sep 2010 21:57:23 -0000 Received: (qmail 82363 invoked by uid 500); 17 Sep 2010 21:57:21 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 82324 invoked by uid 500); 17 Sep 2010 21:57:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 82316 invoked by uid 99); 17 Sep 2010 21:57:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Sep 2010 21:57:20 +0000 X-ASF-Spam-Status: No, hits=4.4 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jeremy.hanna1234@gmail.com designates 209.85.160.172 as permitted sender) Received: from [209.85.160.172] (HELO mail-gy0-f172.google.com) (209.85.160.172) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Sep 2010 21:57:15 +0000 Received: by gyd12 with SMTP id 12so1195981gyd.31 for ; Fri, 17 Sep 2010 14:56:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:from:mime-version :content-type:subject:date:in-reply-to:to:references:message-id :x-mailer; bh=RXJ/dmDHqg5YeLJHpnywtL/4rmqohTzTpSjV5Oa6g1g=; b=N/tkWBhz2v6iDVQOoHpLaSPi4Tg/jcr4utW6GR0Julvgx9m5gdxymGtqvVkosO88Yf IOCcRxda55wfBBiktJCp/4Q5aDhlI3sZ3kCA5WrcRQOAHPAxffdvGrlXhmlnNTZe2yQX yX9ieYSg9fsPWrxZxfy2Od8OwWcEchsMfINvc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:mime-version:content-type:subject:date:in-reply-to:to :references:message-id:x-mailer; b=Fr1xdMAkFK5Pc/lycGOgFlyIXQS384H7v5txms9/E9xfeD38cu2LjRQQnK0Cr2vm/K C5qN7X4NlksB9lezZPBNlPtJ6oZdOl5l1+U/25kr3w4BU/YxyfCyO3dAM6RmEY25d4jh LulE8tdV4/gVzgCaD7llLnTI2qjbKWItQPBAU= Received: by 10.100.190.14 with SMTP id n14mr6232877anf.172.1284760613895; Fri, 17 Sep 2010 14:56:53 -0700 (PDT) Received: from [10.1.184.144] (fw1-aus1.rackspace.net [64.39.0.68]) by mx.google.com with ESMTPS id r20sm6798800anf.7.2010.09.17.14.56.52 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 17 Sep 2010 14:56:52 -0700 (PDT) From: Jeremy Hanna Mime-Version: 1.0 (Apple Message framework v1081) Content-Type: multipart/alternative; boundary=Apple-Mail-8--636751038 Subject: Re: Cassandra performance Date: Fri, 17 Sep 2010 16:56:50 -0500 In-Reply-To: <3FBC215A-EB09-4740-8A38-5D2328F3AF6D@voxeo.com> To: user@cassandra.apache.org References: <3FBC215A-EB09-4740-8A38-5D2328F3AF6D@voxeo.com> Message-Id: <6CEEB726-1553-4E6F-9EC2-4212BE046984@gmail.com> X-Mailer: Apple Mail (2.1081) --Apple-Mail-8--636751038 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii h= ttp://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technical-failures=20= On Sep 17, 2010, at 4:35 PM, Zhong Li wrote: > This is my personal experiences. MySQL is faster than Cassandra on = most normal use cases. =20 >=20 > You should understand why you choose Cassandra instead of MySQL. If = one central MySQL can handle your workload, MySQL is better than = Cassandra. BUT if you are overload one MySQL and want multiple boxes, = Cassandra can be a solution for cheap, Cassandra provides fault = tolerant, decentralized, durable and rich data model. It will not = provide your high performance, especially reading performance is poor.=20= >=20 > Digg failed to use Cassandra. You can check > http://techcrunch.com/2010/09/07/digg-struggles-vp-engineering-door/ >=20 > This doesn't mean Cassandra is bad. You need design carefully to use = Cassandra for your application and business model for success. >=20 >=20 > =20 > On Sep 15, 2010, at 12:06 PM, Wayne wrote: >=20 >> If MySQL is faster then use it. I struggled to do side by side = comparisons with Mysql for months until finally realizing they are too = different to do side by side comparisons. Mysql is always faster out of = the gate when you come at the problem thinking in terms of relational = databases. Add in replication factor, using wider rows, dealing with = databases that are 2-3 terabytes, tables with 3+ billions rows, etc. = etc. The nosql "noise" out there should be ignored, and a solution like = cassandra should be evaluated for what it brings to the table in terms = of a technology that can solve the problems of big data and not how it = does individual queries relative to mysql. If a "normal" database works = for you use it!! >>=20 >> We have tested real loads using a 6 node cluster and consistently get = 5ms reads under load. That is 200 reads/second (1 thread). Mysql is 10x = faster, but then we also have wide rows and in that 5ms get 6 months of = lots of different time series data which in the end means it is 10x = faster than Mysql (1 thread). By embracing wide rows we turn slower into = faster. Add in multiple threads/processes and the ability for a 20 node = cluster to support concurrent reads and Mysql falls back in the dust. = Also we don't have 300gb compressed backup files, we can easily add new = nodes and grow, we can actually add columns dynamically without the = dreaded ddl deadlock nightmare in mysql, and for once we have = replication that just works. >>=20 >>=20 >> On Wed, Sep 15, 2010 at 2:39 AM, Oleg Anastasyev = wrote: >> Kamil Gorlo gmail.com> writes: >>=20 >> > >> > So I've got more reads from single MySQL with 400GB of data than = from >> > 8 machines storing about 266GB. This doesn't look good. What am I >> > doing wrong? :) >>=20 >> The worst case for cassandra is random reads. You should ask youself = a question, >> do you really have this kind of workload in production ? If you = really do, that >> means cassandra is not the right tool for the job. Some product based = on >> berkeley db should work better, e.g. voldemort. Just plain old = filesystem is >> also good for 100% random reads (if you dont need to backup of = course). >>=20 >>=20 >=20 --Apple-Mail-8--636751038 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii http://www.quora.com/Is-Cassandra-to-blame-for-Digg-v4s-technica= l-failures 

On Sep 17, 2010, = at 4:35 PM, Zhong Li wrote:

This is my personal = experiences. MySQL is faster than Cassandra on most normal use cases. =  

You should understand why you choose Cassandra = instead of MySQL. If one central MySQL can handle your workload, MySQL = is better than Cassandra. BUT if you are overload one MySQL and = want multiple boxes, Cassandra can be a solution for cheap, = Cassandra  provides fault tolerant, decentralized, durable and rich = data model. It will not provide your high performance, especially = reading  performance is poor. 

Digg = failed to use Cassandra. You can check



  
On Sep 15, 2010, at 12:06 PM, Wayne wrote:

If MySQL = is faster then use it. I struggled to do side by side comparisons with = Mysql for months until finally realizing they are too different to do = side by side comparisons. Mysql is always faster out of the gate when = you come at the problem thinking in terms of relational databases. Add = in replication factor, using wider rows, dealing with databases that are = 2-3 terabytes, tables with 3+ billions rows, etc. etc. The nosql "noise" = out there should be ignored, and a solution like cassandra should be = evaluated for what it brings to the table in terms of a technology that = can solve the problems of big data and not how it does individual = queries relative to mysql. If a "normal" database works for you use = it!!

We have tested real loads using a 6 node cluster and = consistently get 5ms reads under load. That is 200 reads/second (1 = thread). Mysql is 10x faster, but then we also have wide rows and in = that 5ms get 6 months of lots of different time series data which in the = end means it is 10x faster than Mysql (1 thread). By embracing wide rows = we turn slower into faster. Add in multiple threads/processes and the = ability for a 20 node cluster to support concurrent reads and Mysql = falls back in the dust. Also we don't have 300gb compressed backup = files, we can easily add new nodes and grow, we can actually add columns = dynamically without the dreaded ddl deadlock nightmare in mysql, and for = once we have replication that just works.


On Wed, Sep 15, 2010 at 2:39 AM, Oleg Anastasyev = <oleganas@gmail.com> = wrote:
=
Kamil Gorlo <kgs4242 <at> gmail.com> = writes:

>
> So I've got more reads from single MySQL = with 400GB of data than from
> 8 machines storing about 266GB. = This doesn't look good. What am I
> doing wrong? :)

=
The worst case for cassandra is random reads. You should ask = youself a question,
do you really have this kind of workload in = production ? If you really do, that
means cassandra is not the right = tool for the job. Some product based on
berkeley db should work = better, e.g. voldemort. Just plain old filesystem is
also good for = 100% random reads (if you dont need to backup of course).

=



= --Apple-Mail-8--636751038--