From user-return-60471-archive-asf-public=cust-asf.ponee.io@cassandra.apache.org Tue Mar 20 22:41:10 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id A18C118064A for ; Tue, 20 Mar 2018 22:41:09 +0100 (CET) Received: (qmail 53379 invoked by uid 500); 20 Mar 2018 21:41:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 53369 invoked by uid 99); 20 Mar 2018 21:41:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Mar 2018 21:41:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 14D0BC00D6 for ; Tue, 20 Mar 2018 21:41:07 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.198 X-Spam-Level: *** X-Spam-Status: No, score=3.198 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, KAM_LINEPADDING=1.2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=smartthings-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id dO6Il-JxAha8 for ; Tue, 20 Mar 2018 21:41:03 +0000 (UTC) Received: from mail-wr0-f169.google.com (mail-wr0-f169.google.com [209.85.128.169]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5276F5F4EB for ; Tue, 20 Mar 2018 21:41:03 +0000 (UTC) Received: by mail-wr0-f169.google.com with SMTP id s10so3209471wra.13 for ; Tue, 20 Mar 2018 14:41:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=smartthings-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=nSqu88hd+hhgVnlSWcgoALsXV8MLn8AQQHoVRdNS9Qo=; b=KwKbp/UgG3kjV+QDKRw77ceW3JwkqbkTbdIkVXm9z7Ag3OzzygLYo4DxkMCOvqYI6H W8cxcliD2nF88waojafb/N+DZMQzyhnXoPp3HFT47SJ6qSPZ047KgEeVjCE7FCfFmUzx F5SaaTwz6lEgdreQbt9eZF8DcSpzqu97jRYy3GYaUtQEC1PGgH8UaDCdwBWiXhAQV7JE 9jIfnrHqiWgc005cOkUeD42O4tql+Yi+evA/arK+HRXi/+GILHpSgmXkvmf4cygcC2lz S+2COHp/kZPPQSsXttQjvnISx3F9oomRS0/BuZJLyfw31VSI6ITi0PVcuQSjmiH0iiOz I8AQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=nSqu88hd+hhgVnlSWcgoALsXV8MLn8AQQHoVRdNS9Qo=; b=eOh0NwDKi6Sh+k3i0msamvsVTlXH0hT0ICu6saGPHryyZ2ZLhQvPaQ9vI9riV63504 r1QE/piXsJkRnZ/w3HraZ7JpkJkZHmlQfBazZJMCnVqtZoZ9JqCxXmIBtjpCSt8FuSmG GX8/5tbjTYAaTcpN9jWOgWeXx5GybvlC8UEQBY2Tcfq/WiY16WbClb5yFLAxTfoFP+dY Fvj3xlButFou3o85LWXk0e/2aipEadrMPVUcriAZDHr7fXNX06tsA1u0bByiKgGqu4BX 7bfU++sGxrH8zqhdC5mnWAl/SE+hTKb8DkXGBQlTQ0DYHahu82Vys0ML2Z50XlI8ky1Y ILbg== X-Gm-Message-State: AElRT7FgfZ+gswdoIv7fhgHy8sAPyOweOmgHmt6rvV60DPGLvq4HmeVI a+SJb7duq7Bnjl2xQk60owXb8vueO2NVKd+CtG6XEw== X-Google-Smtp-Source: AG47ELurhzF69GPoZ/b6SL+F5v+l8RdGXxScOBGCBjN/7SfLPGOtFBfoOMwsgTLAnvePIeBSCTcu5ROP4SUnZsps7yg= X-Received: by 10.223.162.201 with SMTP id t9mr14069199wra.148.1521582062068; Tue, 20 Mar 2018 14:41:02 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.131.5 with HTTP; Tue, 20 Mar 2018 14:41:01 -0700 (PDT) In-Reply-To: References: From: Carl Mueller Date: Tue, 20 Mar 2018 16:41:01 -0500 Message-ID: Subject: Re: [EXTERNAL] Cassandra vs MySQL To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary="f403045e9c2c4e85aa0567deee48" --f403045e9c2c4e85aa0567deee48 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Yes, cassandra's big win is that once you get your data and applications adapted to the platform, you have a clear path to very very large scale and resiliency. Um, assuming you have the dollars. It scales out on commodity hardware, but isn't exactly efficient in the use of that hardware. I like to say that Cassandra makes big data "bigger data" because of the timestamp-per-cell and column name overhead and replication factor. On Tue, Mar 20, 2018 at 2:54 PM, Jeff Jirsa wrote: > I suspect you're approaching this problem from the wrong side. > > The decision of MySQL vs Cassandra isn't usually about performance, it's > about the other features that may impact/enable that performance. > > - Will you have a data set that won't fit on any single MySQL Server? > - Will you want to write into two different hot datacenters at the same > time? > - Do you want to be able to restart any single server without impacting > the cluster? > > If you answer yes to those, then cassandra has an option to do so > trivially, where you'd have to build tooling with MySQL. > > - Do you want to do arbitrary text searches? > - Do you need JOINs? > - Do you want to build indices on a lot of the columns and do ad-hoc > querying? > > If you answer yes to those, they're far easier in MySQL than Cassandra. > > If you're just looking for "Cassandra can do X writes per second and MySQ= L > can do Y writes per second", those types of benchmarks are rarely relevan= t, > because in both cases they tend to require expert tuning to get the full > potential (and very few people are experts in both) and data dependent (a= nd > your data probably doesn't match the benchmarker's dataset). > > If I had a dataset that was ~10-20gb and wanted to do arbitrary reads on > the data, I'd choose MySQL unless I absolutely positively could not > tolerate downtime, in which case I'd go with Cassandra spanning multiple > datacenters. If I had a dataset that was 200TB, or 200PB, I'd choose > Cassandra, even if I could theoretically make MySQL do it faster, because > the extra effort in building the tooling to manage that many shards of > MySQL would be prohibitive to most organizations. > > > > > > > > On Tue, Mar 20, 2018 at 11:44 AM, Oliver Ruebenacker > wrote: > >> >> Hello, >> >> Thanks for all the responses. >> >> I do know some SQL and CQL, so I know the main differences. You can do >> joins in MySQL, but the bigger your data, the less likely you want to do >> that. >> >> If you are a team that wants to consider migrating from MySQL to >> Cassandra, you need some reason to believe that it is going to be faster= . >> What evidence is there? >> >> Even the Cassandra home page has references to benchmarks to make the >> case for Cassandra. Unfortunately, they seem to be about five to six yea= rs >> old. It doesn't make sense to keep them there if you just can't compare. >> >> Best, Oliver >> >> On Tue, Mar 20, 2018 at 1:13 PM, Durity, Sean R < >> SEAN_R_DURITY@homedepot.com> wrote: >> >>> I=E2=80=99m not sure there is a fair comparison. MySQL and Cassandra ha= ve >>> different ways of solving related (but not necessarily the same) proble= ms >>> of storing and retrieving data. >>> >>> >>> >>> The data model between MySQL and Cassandra is likely to be very >>> different. The key for Cassandra is that you need to model for the quer= ies >>> that will be executed. If you cannot know the queries ahead of time, >>> Cassandra is not the best choice. If table scans are typically required= , >>> Cassandra is not a good choice. If you need more than a few hundred tab= les >>> in a cluster, Cassandra is not a good choice. >>> >>> >>> >>> If multi-datacenter replication is required, Cassandra is an awesome >>> choice. If you are going to always query by a partition key (or primary >>> key), Cassandra is a great choice. The nice thing is that the performan= ce >>> scales linearly, so additional data is fine (as long as you add nodes) = =E2=80=93 >>> again, if your data model is designed for Cassandra. If you like >>> no-downtime upgrades and extreme reliability and availability, Cassandr= a is >>> a great choice. >>> >>> >>> >>> Personally, I hope to never have to use/support MySQL again, and I love >>> working with Cassandra. But, Cassandra is not the choice for all data >>> problems. >>> >>> >>> >>> >>> >>> Sean Durity >>> >>> >>> >>> *From:* Oliver Ruebenacker [mailto:curoli@gmail.com] >>> *Sent:* Monday, March 12, 2018 3:58 PM >>> *To:* user@cassandra.apache.org >>> *Subject:* [EXTERNAL] Cassandra vs MySQL >>> >>> >>> >>> >>> >>> Hello, >>> >>> We have a project currently using MySQL single-node with 5-6TB of dat= a >>> and some performance issues, and we plan to add data up to a total size= of >>> maybe 25-30TB. >>> >>> We are thinking of migrating to Cassandra. I have been trying to find >>> benchmarks or other guidelines to compare MySQL and Cassandra, but most= of >>> them seem to be five years old or older. >>> >>> Is there some good more recent material? >>> >>> Thanks! >>> >>> Best, Oliver >>> >>> >>> -- >>> >>> Oliver Ruebenacker >>> >>> Senior Software Engineer, Diabetes Portal >>> , >>> Broad Institute >>> >>> >>> >>> >>> ------------------------------ >>> >>> The information in this Internet Email is confidential and may be >>> legally privileged. It is intended solely for the addressee. Access to = this >>> Email by anyone else is unauthorized. If you are not the intended >>> recipient, any disclosure, copying, distribution or any action taken or >>> omitted to be taken in reliance on it, is prohibited and may be unlawfu= l. >>> When addressed to our clients any opinions or advice contained in this >>> Email are subject to the terms and conditions expressed in any applicab= le >>> governing The Home Depot terms of business or client engagement letter.= The >>> Home Depot disclaims all responsibility and liability for the accuracy = and >>> content of this attachment and for any damages or losses arising from a= ny >>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or oth= er >>> items of a destructive nature, which may be contained in this attachmen= t >>> and shall not be liable for direct, indirect, consequential or special >>> damages in connection with this e-mail message or its attachment. >>> >> >> >> >> -- >> Oliver Ruebenacker >> Senior Software Engineer, Diabetes Portal >> , Broad Institute >> >> >> > --f403045e9c2c4e85aa0567deee48 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Yes, cassandra's big win is that once you get your dat= a and applications adapted to the platform, you have a clear path to very v= ery large scale and resiliency. Um, assuming you have the dollars. It scale= s out on commodity hardware, but isn't exactly efficient in the use of = that hardware. I like to say that Cassandra makes big data "bigger dat= a" because of the timestamp-per-cell and column name overhead and repl= ication factor.=C2=A0

On Tue, Mar 20, 2018 at 2:54 PM, Jeff Jirsa <jjirsa@gmail.com= > wrote:
I= suspect you're approaching this problem from the wrong side.

<= /div>
The decision of MySQL vs Cassandra isn't usually about perfor= mance, it's about the other features that may impact/enable that perfor= mance.

- Will you have a data set that won't f= it on any single MySQL Server?=C2=A0
- Will you want to write int= o two different hot datacenters at the same time?
- Do you want t= o be able to restart any single server without impacting the cluster?
=

If you answer yes to those, then cassandra has an optio= n to do so trivially, where you'd have to build tooling with MySQL.

- Do you want to do arbitrary text searches?
- Do you need JOINs?
- Do you want to build indices on a lot of= the columns and do ad-hoc querying?

If you answer= yes to those, they're far easier in MySQL than Cassandra.
If you're just looking for "Cassandra can do X writes= per second and MySQL can do Y writes per second", those types of benc= hmarks are rarely relevant, because in both cases they tend to require expe= rt tuning to get the full potential (and very few people are experts in bot= h) and data dependent (and your data probably doesn't match the benchma= rker's dataset).

If I had a dataset that was ~= 10-20gb and wanted to do arbitrary reads on the data, I'd choose MySQL = unless I absolutely positively could not tolerate downtime, in which case I= 'd go with Cassandra spanning multiple datacenters. If I had a dataset = that was 200TB, or 200PB, I'd choose Cassandra, even if I could theoret= ically make MySQL do it faster, because the extra effort in building the to= oling to manage that many shards of MySQL would be prohibitive to most orga= nizations.







On Tue, Mar 20,= 2018 at 11:44 AM, Oliver Ruebenacker <curoli@gmail.com> wrot= e:
<= div>

=C2=A0=C2=A0=C2=A0=C2=A0 Hello,

=C2=A0 Than= ks for all the responses.

=C2=A0 I do know some SQL and CQL, s= o I know the main differences. You can do joins in MySQL, but the bigger yo= ur data, the less likely you want to do that.

=C2=A0 If you ar= e a team that wants to consider migrating from MySQL to Cassandra, you need= some reason to believe that it is going to be faster. What evidence is the= re?

=C2=A0 Even the Cassandra home page has references to benc= hmarks to make the case for Cassandra. Unfortunately, they seem to be about= five to six years old. It doesn't make sense to keep them there if you= just can't compare.

=C2=A0=C2=A0=C2=A0=C2=A0 Best, Oliver=

On Tue, Mar 20, 2018 at 1:13 PM, Durity, Sean R <SEAN_R_DURITY= @homedepot.com> wrote:

I=E2=80=99m not sure there is a fair = comparison. MySQL and Cassandra have different ways of solving related (but= not necessarily the same) problems of storing and retrieving data.

=C2=A0

The data model between MySQL and Cass= andra is likely to be very different. The key for Cassandra is that you nee= d to model for the queries that will be executed. If you cannot know the queries ahead of time, Cassandra is not the best ch= oice. If table scans are typically required, Cassandra is not a good choice= . If you need more than a few hundred tables in a cluster, Cassandra is not= a good choice.

=C2=A0

If multi-datacenter replication is re= quired, Cassandra is an awesome choice. If you are going to always query by= a partition key (or primary key), Cassandra is a great choice. The nice thing is that the performance scales linearly, so= additional data is fine (as long as you add nodes) =E2=80=93 again, if you= r data model is designed for Cassandra. If you like no-downtime upgrades an= d extreme reliability and availability, Cassandra is a great choice.

=C2=A0

Personally, I hope to never have to u= se/support MySQL again, and I love working with Cassandra. But, Cassandra i= s not the choice for all data problems.

=C2=A0

=C2=A0

Sean Durity

=C2=A0

From: Oliver Ruebenacker [mailto:curoli@gmail.com]
Sent: Monday, March 12, 2018 3:58 PM
To: u= ser@cassandra.apache.org
Subject: [EXTERNAL] Cassandra vs MySQL

=

=C2=A0

=C2=A0

=C2=A0=C2=A0=C2=A0=C2= =A0 Hello,

=C2=A0 We have a proj= ect currently using MySQL single-node with 5-6TB of data and some performan= ce issues, and we plan to add data up to a total size of maybe 25-30TB.<= /u>

=C2=A0 We are thinkin= g of migrating to Cassandra. I have been trying to find benchmarks or other= guidelines to compare MySQL and Cassandra, but most of them seem to be fiv= e years old or older.

=C2=A0 Is there some = good more recent material?

=C2=A0 Thanks!=

=C2=A0=C2=A0=C2=A0=C2=A0 Best, Oliver


--

Oliver Ruebenacker

Senior Software Engineer, Diabetes Portal, Broad Institute

=C2=A0




The information in this Internet Email is confidential and may be legally p= rivileged. It is intended solely for the addressee. Access to this Email by= anyone else is unauthorized. If you are not the intended recipient, any di= sclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibite= d and may be unlawful. When addressed to our clients any opinions or advice= contained in this Email are subject to the terms and conditions expressed = in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot d= isclaims all responsibility and liability for the accuracy and content of t= his attachment and for any damages or losses arising from any inaccuracies,= errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contain= ed in this attachment and shall not be liable for direct, indirect, consequ= ential or special damages in connection with this e-mail message or its att= achment.



--
Oliver Ruebenacker
Senior Software E= ngineer, Diabetes Portal, Broad Institute



--f403045e9c2c4e85aa0567deee48--