Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of doanduyhai@gmail.com designates
 209.85.192.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CADfhF1EN61pvj8sDK8dBvjBg7M+8Jwk2TFhyTTaObduhLFfPnw@mail.gmail.com>
References: 
 <CADfhF1GmaaTEt3yBAQCbqzf7fK=gLKis-Xi=NTX0tTrbVvxwkg@mail.gmail.com>
	<CACWHCDKZhGYbJisnyQN2+V=hQWB+DdiharDT1OZbHp02_Sa33w@mail.gmail.com>
	<CADfhF1EN61pvj8sDK8dBvjBg7M+8Jwk2TFhyTTaObduhLFfPnw@mail.gmail.com>
Date: Fri, 4 Jul 2014 23:10:25 +0200
Message-ID: 
 <CABNXB2DskxvGZ=Gkjj9c8vmzMP5FCNgSY-7HnHWFYjonLU+p_Q@mail.gmail.com>
Subject: Re: Cassandra use cases/Strengths/Weakness
From: DuyHai Doan <doanduyhai@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=089e013cba98e34f5104fd648d7c

--089e013cba98e34f5104fd648d7c
Content-Type: text/plain; charset=UTF-8

I would answer your question this way:

1) Why should I choose C* ?

 a. linear scalability, throughputs scale "almost" linearly with number of
nodes

 b. almost unbounded extensivity (there is no limit, or at least  huge
limit in term of number of nodes you can have on a cluster)

 c. operational simplicity due to master-less architecture. This feature
is, although quite transparent for developers, is a key selling point.
Having suffered when installing manually a Hadoop cluster, I happen to love
the deployment simplicity of C*, only one process per node, no moving parts.

d. high availability. C* trades consistency for availability clearly so you
can expect to have something like 99.99% of uptime. Very selling point for
critical business which need to be up all the time

e. support for multi data centers out of the box. Again, on the operational
side, it's a great feature if you plan a worldwide deployment

That's all I can see for now

2) Why shouldn't I choose C* ?

a. need for a strong consistency most of the time. Although you can perform
all requests  with Consistency level ALL, it's clearly not the best use of
C*. You'll suffer for higher latency and reduced availability. Even the new
"lightweight transaction" feature is not meant to be use on large scale

b. very complicated and changing queries. Denormalizing is great when you
know ahead of time exactly how you'll query your data. Once done, any new
way of querying will require new coding & new tables to support it

c. ridiculous data load. I've seen people in prod using C* for only 200Gb
because they want to be trendy and use bleeding edge technologies. They'd
better off using a classical RDBMS solution that fit perfectly their load

Hope that helps

Duy Hai DOAN


On Fri, Jul 4, 2014 at 9:31 PM, Prem Yadav <ipremyadav@gmail.com> wrote:

> Thanks Manoj. Great post for those who already have Cassandra in
> production.
> However it brings me back to my original post.
> All the points you have mentioned apply to any big data technology.
> Storage- All of them
> Query- All of them. In fact lot of them perform better. Agree that CQL
> structure is better. But hive,mongo all good
> Availability- many of them
>
> So my question is basically to Cassandra support people e.g.- Datastax Or
> the developers.
> What makes Cassandra special.
> If I have to convince my CTO to spend million dollars on a cluster and
> support, his first question would be why Cassandra? Why not this or that?
>
> So I still am not sure about what special Cassandra brings to the table?
>
> Sorry about the rant. But in the enterprise world, decisions are taken
> based on taking into account the stability, convincing managers and what
> not. Chosen technology has to be stable for years. People should be
> convinced that the engineers are not going to do a lot of firefighting.
>
> Any inputs appreciated.
>
>
>
> On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar <khangaonkar@gmail.com>
> wrote:
>
>> These are my personal opinions based on few months using Cassandra. These
>> are my views. Others
>> may have different opinion
>>
>>
>>
>> http://khangaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html
>>
>> regards
>>
>>
>>
>> On Fri, Jul 4, 2014 at 7:37 AM, Prem Yadav <ipremyadav@gmail.com> wrote:
>>
>>> Hi,
>>> I have seen this in a lot of replies that Cassandra is not designed for
>>> this and that. I don't want to sound rude, i just need some info about this
>>> so that i can compare it to technologies like hbase, mongo, elasticsearch, solr,
>>> etc.
>>>
>>> 1) what is Cassandra designed for. Heave writes yes. So is Hbase. Or
>>> ElasticSearch
>>> What is the use case(s) that suit Cassandra.
>>>
>>> 2) What kind of queries are best suited for Cassandra.
>>> I ask this Because I have seen people asking about queries and getting
>>> replies that its not suited for Cassandra. For ex: queries where large
>>> number of rows are requested and timeout happens. Or range queries or
>>> aggregate queries.
>>>
>>> 3) Where does Cassandra excel compared to other technologies?
>>>
>>> I have been working on Casandra for some time. I know how it works and I
>>> like it very much.
>>> We are moving towards building a big cluster. But at this point, I am
>>> not sure if its a right decision.
>>>
>>> A lot of people including me like Cassandra in my company. But it has
>>> more to do with the CQL and not the internals or the use cases. Until now,
>>> there have been small PoCs and people enjoyed it. But a large scale
>>> project, we are not so sure.
>>>
>>> Please guide us.
>>> Please note that the drawbacks of other technologies do not interest me,
>>> its the strengths/weaknesses of Cassandra I am interested in.
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> http://khangaonkar.blogspot.com/
>>
>
>

--089e013cba98e34f5104fd648d7c
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>I would answer your question this way:<br></div><div>=
<br></div><div>1) Why should I choose C* ?</div><div><br></div><div>=C2=A0a=
. linear scalability, throughputs scale &quot;almost&quot; linearly with nu=
mber of nodes</div>

<div><br></div><div>=C2=A0b. almost unbounded extensivity (there is no limi=
t, or at least =C2=A0huge limit in term of number of nodes you can have on =
a cluster)</div><div><br></div><div>=C2=A0c. operational simplicity due to =
master-less architecture. This feature is, although quite transparent for d=
evelopers, is a key selling point. Having suffered when installing manually=
 a Hadoop cluster, I happen to love the deployment simplicity of C*, only o=
ne process per node, no moving parts.</div>

<div><br></div><div>d. high availability. C* trades consistency for availab=
ility clearly so you can expect to have something like 99.99% of uptime. Ve=
ry selling point for critical business which need to be up all the time</di=
v>

<div><br></div><div>e. support for multi data centers out of the box. Again=
, on the operational side, it&#39;s a great feature if you plan a worldwide=
 deployment</div><div><br></div><div>That&#39;s all I can see for now</div>

<div><br></div><div>2) Why shouldn&#39;t I choose C* ?</div><div><br></div>=
<div>a. need for a strong consistency most of the time. Although you can pe=
rform all requests =C2=A0with Consistency level ALL, it&#39;s clearly not t=
he best use of C*. You&#39;ll suffer for higher latency and reduced availab=
ility. Even the new &quot;lightweight transaction&quot; feature is not mean=
t to be use on large scale</div>

<div><br></div><div>b. very complicated and changing queries. Denormalizing=
 is great when you know ahead of time exactly how you&#39;ll query your dat=
a. Once done, any new way of querying will require new coding &amp; new tab=
les to support it</div>

<div><br></div><div>c. ridiculous data load. I&#39;ve seen people in prod u=
sing C* for only 200Gb because they want to be trendy and use bleeding edge=
 technologies. They&#39;d better off using a classical RDBMS solution that =
fit perfectly their load<br>
<br></div><div>Hope that helps<br><br></div><div>Duy Hai DOAN<br></div>
<div><br></div></div><div class=3D"gmail_extra"><br><br><div class=3D"gmail=
_quote">On Fri, Jul 4, 2014 at 9:31 PM, Prem Yadav <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:ipremyadav@gmail.com" target=3D"_blank">ipremyadav@gmail.co=
m</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Thanks Manoj. Great post fo=
r those who already have Cassandra in production.<div>However it brings me =
back to my original post.</div>
<div>All the points you have mentioned apply to any big data technology.</d=
iv>
<div>Storage- All of them</div><div>Query- All of them. In fact lot of them=
 perform better. Agree that CQL structure is better. But hive,mongo all goo=
d</div><div>Availability- many of them</div><div><br></div><div>So my quest=
ion is basically to Cassandra support people e.g.- Datastax Or the develope=
rs.=C2=A0</div>

<div>What makes Cassandra special.=C2=A0</div><div>If I have to convince my=
 CTO to spend million dollars on a cluster and support, his first question =
would be why Cassandra? Why not this or that?</div><div><br></div><div>So I=
 still am not sure about what special Cassandra brings to the table?</div>

<div><br></div><div>Sorry about the rant. But in the enterprise world, deci=
sions are taken based on taking into account the stability, convincing mana=
gers and what not. Chosen technology has to be stable for years. People sho=
uld be convinced that the engineers are not going to do a lot of firefighti=
ng.</div>

<div><br></div><div>Any inputs appreciated.</div><div><br></div></div><div =
class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_extra"><br><br><div =
class=3D"gmail_quote">On Fri, Jul 4, 2014 at 7:07 PM, Manoj Khangaonkar <sp=
an dir=3D"ltr">&lt;<a href=3D"mailto:khangaonkar@gmail.com" target=3D"_blan=
k">khangaonkar@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>These are my personal =
opinions based on few months using Cassandra. These are my views. Others<br=
>
</div>
<div>may have different opinion<br></div><div><br><br><a href=3D"http://kha=
ngaonkar.blogspot.com/2014/06/apache-cassandra-things-to-consider.html" tar=
get=3D"_blank">http://khangaonkar.blogspot.com/2014/06/apache-cassandra-thi=
ngs-to-consider.html</a><br>


<br></div><div>regards<br></div><div><br></div></div><div class=3D"gmail_ex=
tra"><div><div><br><br><div class=3D"gmail_quote">On Fri, Jul 4, 2014 at 7:=
37 AM, Prem Yadav <span dir=3D"ltr">&lt;<a href=3D"mailto:ipremyadav@gmail.=
com" target=3D"_blank">ipremyadav@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr"><span style=3D"font-family:=
arial,sans-serif;font-size:13px">Hi,</span><br style=3D"font-family:arial,s=
ans-serif;font-size:13px">


<span style=3D"font-family:arial,sans-serif;font-size:13px">I have seen thi=
s in a lot of replies that Cassandra is not designed for this and that. I d=
on&#39;t want to sound rude, i just need some info about this so that i can=
 compare it to technologies like hbase, mongo, elasticsearch,=C2=A0</span><=
span style=3D"font-family:arial,sans-serif;font-size:13px">solr, etc.</span=
><div>


<br style=3D"font-family:arial,sans-serif;font-size:13px"><span style=3D"fo=
nt-family:arial,sans-serif;font-size:13px">1) what is Cassandra designed fo=
r. Heave writes yes. So is Hbase. Or ElasticSearch</span></div><div><span s=
tyle=3D"font-family:arial,sans-serif;font-size:13px">What is the use case(s=
) that suit Cassandra.</span><br style=3D"font-family:arial,sans-serif;font=
-size:13px">


<br style=3D"font-family:arial,sans-serif;font-size:13px"><span style=3D"fo=
nt-family:arial,sans-serif;font-size:13px">2) What kind of queries are best=
 suited for Cassandra.</span><br style=3D"font-family:arial,sans-serif;font=
-size:13px">


<span style=3D"font-family:arial,sans-serif;font-size:13px">I ask this Beca=
use I have seen people asking about queries and getting replies that its no=
t suited for Cassandra. For ex: queries where large number of rows are requ=
ested and timeout happens. Or range queries or aggregate queries.</span><br=
>


</div><div><span style=3D"font-family:arial,sans-serif;font-size:13px"><br>=
</span></div><div><span style=3D"font-family:arial,sans-serif;font-size:13p=
x">3) Where does Cassandra excel compared to other technologies?</span></di=
v>


<div><br></div><div><font face=3D"arial, sans-serif">I have been working on=
 Casandra for some time. I know how it works and I like it very much.=C2=A0=
</font></div><div><font face=3D"arial, sans-serif">We are moving towards bu=
ilding a big cluster. But at this point, I am not sure if its a right decis=
ion.=C2=A0</font></div>


<div><br></div><div>A lot of people including me like Cassandra in my compa=
ny. But it has more to do with the CQL and not the internals or the use cas=
es. Until now, there have been small PoCs and people enjoyed it. But a larg=
e scale project, we are not so sure.</div>


<div><br></div><div>Please guide us.</div><div>Please note that the drawbac=
ks of other technologies do not interest me, its the strengths/weaknesses o=
f Cassandra I am interested in.</div><div>Thanks</div><div><br></div><div>


=C2=A0<br></div><div><br></div><div><font face=3D"arial, sans-serif"><br></=
font></div><div><font face=3D"arial, sans-serif"><br></font></div><div><spa=
n style=3D"font-family:arial,sans-serif;font-size:13px"><br></span></div></=
div>
</blockquote></div><br><br clear=3D"all"><br></div></div><span><font color=
=3D"#888888">-- <br><a href=3D"http://khangaonkar.blogspot.com/" target=3D"=
_blank">http://khangaonkar.blogspot.com/</a>
</font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--089e013cba98e34f5104fd648d7c--