Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1C7B811642 for ; Fri, 4 Jul 2014 21:10:58 +0000 (UTC) Received: (qmail 37207 invoked by uid 500); 4 Jul 2014 21:10:53 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 37164 invoked by uid 500); 4 Jul 2014 21:10:53 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 37154 invoked by uid 99); 4 Jul 2014 21:10:53 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jul 2014 21:10:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,LOTS_OF_MONEY,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of doanduyhai@gmail.com designates 209.85.192.44 as permitted sender) Received: from [209.85.192.44] (HELO mail-qg0-f44.google.com) (209.85.192.44) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 04 Jul 2014 21:10:50 +0000 Received: by mail-qg0-f44.google.com with SMTP id j107so1794612qga.17 for ; Fri, 04 Jul 2014 14:10:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Tpi2FklbkaMTeLyLw0usJcTBe8R+B226V2ENKY1AqL4=; b=tllnNHBRwt8N6l5WTWZX6jXgJmuALl9Xd9li9dbuhkfqKFVw4adleVhmdvXvuZDRZK QOkXBubwbxs5Pnu2PDwGZDdTP/gyanSuUBw6QHtHsM7ZzagWyfsJkyeZyH+5QfxX5hzf rI5JzQW65lrDUkodoZuozmlVioGuhwGlGam15dGRK7qOW4tWJ22oU5OQekmDEqSXRnky P2LXUJGwSt0MlB3zgx86gQCx0bwwFqeSCQ4i2pFDtPqKotFv0QxsoWc9zNpTM6H8U84S 7ePK+yUQ/AhNda+p5BltGhJL/pueIvvcFhFPIIu62GWmWaYC5MtAGAfOEhLDejNkeHny Ih0w== MIME-Version: 1.0 X-Received: by 10.224.98.197 with SMTP id r5mr23192948qan.57.1404508225879; Fri, 04 Jul 2014 14:10:25 -0700 (PDT) Received: by 10.140.35.241 with HTTP; Fri, 4 Jul 2014 14:10:25 -0700 (PDT) In-Reply-To: References: Date: Fri, 4 Jul 2014 23:10:25 +0200 Message-ID: Subject: Re: Cassandra use cases/Strengths/Weakness From: DuyHai Doan To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=089e013cba98e34f5104fd648d7c X-Virus-Checked: Checked by ClamAV on apache.org --089e013cba98e34f5104fd648d7c Content-Type: text/plain; charset=UTF-8 I would answer your question this way: 1) Why should I choose C* ? a. linear scalability, throughputs scale "almost" linearly with number of nodes b. almost unbounded extensivity (there is no limit, or at least huge limit in term of number of nodes you can have on a cluster) c. operational simplicity due to master-less architecture. This feature is, although quite transparent for developers, is a key selling point. Having suffered when installing manually a Hadoop cluster, I happen to love the deployment simplicity of C*, only one process per node, no moving parts. d. high availability. C* trades consistency for availability clearly so you can expect to have something like 99.99% of uptime. Very selling point for critical business which need to be up all the time e. support for multi data centers out of the box. Again, on the operational side, it's a great feature if you plan a worldwide deployment That's all I can see for now 2) Why shouldn't I choose C* ? a. need for a strong consistency most of the time. Although you can perform all requests with Consistency level ALL, it's clearly not the best use of C*. You'll suffer for higher latency and reduced availability. Even the new "lightweight transaction" feature is not meant to be use on large scale b. very complicated and changing queries. Denormalizing is great when you know ahead of time exactly how you'll query your data. Once done, any new way of querying will require new coding & new tables to support it c. ridiculous data load. I've seen people in prod using C* for only 200Gb because they want to be trendy and use bleeding edge technologies. They'd better off using a classical RDBMS solution that fit perfectly their load Hope that helps Duy Hai DOAN On Fri, Jul 4, 2014 at 9:31 PM, Prem Yadav wrote: > Thanks Manoj. Great post for those who already have Cassandra in > production. > However it brings me back to my original post. > All the points you have mentioned apply to any big data technology. > Storage- All of them > Query- All of them. In fact lot of them perform better. Agree that CQL > structure is better. But hive,mongo all good > Availability- many of them > > So my question is basically to Cassandra support people e.g.- Datastax Or > the developers. > What makes Cassandra special. > If I have to convince my CTO to spend million dollars on a cluster and > support, his first question would be why Cassandra? Why not this or that? > > So I still am not sure about what special Cassandra brings to the table? > > Sorry about the rant. But in the enterprise world, decisions are taken > based on taking into account the stability, convincing managers and what > not. Chosen technology has to be stable for years. People should be > convinced that the engineers are not going to do a lot of firefighting. > > Any inputs appreciated. > > > > On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar > wrote: > >> These are my personal opinions based on few months using Cassandra. These >> are my views. Others >> may have different opinion >> >> >> >> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html >> >> regards >> >> >> >> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav wrote: >> >>> Hi, >>> I have seen this in a lot of replies that Cassandra is not designed for >>> this and that. I don't want to sound rude, i just need some info about this >>> so that i can compare it to technologies like hbase, mongo, elasticsearch, solr, >>> etc. >>> >>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or >>> ElasticSearch >>> What is the use case(s) that suit Cassandra. >>> >>> 2) What kind of queries are best suited for Cassandra. >>> I ask this Because I have seen people asking about queries and getting >>> replies that its not suited for Cassandra. For ex: queries where large >>> number of rows are requested and timeout happens. Or range queries or >>> aggregate queries. >>> >>> 3) Where does Cassandra excel compared to other technologies? >>> >>> I have been working on Casandra for some time. I know how it works and I >>> like it very much. >>> We are moving towards building a big cluster. But at this point, I am >>> not sure if its a right decision. >>> >>> A lot of people including me like Cassandra in my company. But it has >>> more to do with the CQL and not the internals or the use cases. Until now, >>> there have been small PoCs and people enjoyed it. But a large scale >>> project, we are not so sure. >>> >>> Please guide us. >>> Please note that the drawbacks of other technologies do not interest me, >>> its the strengths/weaknesses of Cassandra I am interested in. >>> Thanks >>> >>> >>> >>> >>> >>> >>> >> >> >> -- >> http://khangaonkar.blogspot.com/ >> > > --089e013cba98e34f5104fd648d7c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I would answer your question this way:
=
1) Why should I choose C* ?

=C2=A0a= . linear scalability, throughputs scale "almost" linearly with nu= mber of nodes

=C2=A0b. almost unbounded extensivity (there is no limi= t, or at least =C2=A0huge limit in term of number of nodes you can have on = a cluster)

=C2=A0c. operational simplicity due to = master-less architecture. This feature is, although quite transparent for d= evelopers, is a key selling point. Having suffered when installing manually= a Hadoop cluster, I happen to love the deployment simplicity of C*, only o= ne process per node, no moving parts.

d. high availability. C* trades consistency for availab= ility clearly so you can expect to have something like 99.99% of uptime. Ve= ry selling point for critical business which need to be up all the time

e. support for multi data centers out of the box. Again= , on the operational side, it's a great feature if you plan a worldwide= deployment

That's all I can see for now

2) Why shouldn't I choose C* ?

=
a. need for a strong consistency most of the time. Although you can pe= rform all requests =C2=A0with Consistency level ALL, it's clearly not t= he best use of C*. You'll suffer for higher latency and reduced availab= ility. Even the new "lightweight transaction" feature is not mean= t to be use on large scale

b. very complicated and changing queries. Denormalizing= is great when you know ahead of time exactly how you'll query your dat= a. Once done, any new way of querying will require new coding & new tab= les to support it

c. ridiculous data load. I've seen people in prod u= sing C* for only 200Gb because they want to be trendy and use bleeding edge= technologies. They'd better off using a classical RDBMS solution that = fit perfectly their load

Hope that helps

Duy Hai DOAN



On Fri, Jul 4, 2014 at 9:31 PM, Prem Yadav <ipremyadav@gmail.co= m> wrote:
Thanks Manoj. Great post fo= r those who already have Cassandra in production.
However it brings me = back to my original post.
All the points you have mentioned apply to any big data technology.
Storage- All of them
Query- All of them. In fact lot of them= perform better. Agree that CQL structure is better. But hive,mongo all goo= d
Availability- many of them

So my quest= ion is basically to Cassandra support people e.g.- Datastax Or the develope= rs.=C2=A0
What makes Cassandra special.=C2=A0
If I have to convince my= CTO to spend million dollars on a cluster and support, his first question = would be why Cassandra? Why not this or that?

So I= still am not sure about what special Cassandra brings to the table?

Sorry about the rant. But in the enterprise world, deci= sions are taken based on taking into account the stability, convincing mana= gers and what not. Chosen technology has to be stable for years. People sho= uld be convinced that the engineers are not going to do a lot of firefighti= ng.

Any inputs appreciated.



On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar <khangaonkar@gmail.com> wrote:
These are my personal = opinions based on few months using Cassandra. These are my views. Others
may have different opinion
regards



On Fri, Jul 4, 2014 at 7:= 37 AM, Prem Yadav <ipremyadav@gmail.com> wrote:
Hi,
I have seen thi= s in a lot of replies that Cassandra is not designed for this and that. I d= on't want to sound rude, i just need some info about this so that i can= compare it to technologies like hbase, mongo, elasticsearch,=C2=A0<= span style=3D"font-family:arial,sans-serif;font-size:13px">solr, etc.

1) what is Cassandra designed fo= r. Heave writes yes. So is Hbase. Or ElasticSearch
What is the use case(s= ) that suit Cassandra.

2) What kind of queries are best= suited for Cassandra.
I ask this Beca= use I have seen people asking about queries and getting replies that its no= t suited for Cassandra. For ex: queries where large number of rows are requ= ested and timeout happens. Or range queries or aggregate queries.

=
3) Where does Cassandra excel compared to other technologies?

I have been working on= Casandra for some time. I know how it works and I like it very much.=C2=A0=
We are moving towards bu= ilding a big cluster. But at this point, I am not sure if its a right decis= ion.=C2=A0

A lot of people including me like Cassandra in my compa= ny. But it has more to do with the CQL and not the internals or the use cas= es. Until now, there have been small PoCs and people enjoyed it. But a larg= e scale project, we are not so sure.

Please guide us.
Please note that the drawbac= ks of other technologies do not interest me, its the strengths/weaknesses o= f Cassandra I am interested in.
Thanks

=C2=A0







--
http://khangaonkar.blogspot.com/


--089e013cba98e34f5104fd648d7c--