Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of mvallebr@gmail.com designates
 209.85.215.44 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CC7DD8E7.10D4F%Dean.Hiller@nrel.gov>
References: 
 <CABKQidtxSaAzChH=LoDyMK8snj100kuqsgx8YB5z_+AjwuaVWw@mail.gmail.com>
	<CC7DD8E7.10D4F%Dean.Hiller@nrel.gov>
Date: Tue, 18 Sep 2012 13:50:54 -0300
Message-ID: 
 <CABKQidtuen7+i1yvpR547LpAwu5V3+JP+J+JM57vL+Os4qrMzA@mail.gmail.com>
Subject: Re: Is Cassandra right for me?
From: Marcelo Elias Del Valle <mvallebr@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=bcaec55554ea899a6804c9fcb16c

--bcaec55554ea899a6804c9fcb16c
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

You're talking about this project, right?
https://github.com/deanhiller/playorm
I will take a look. However, I don't think using Cassandra's model itself
(with CFs / key-values) would be a problem, I just need to know where the
advantage relies on. By your answer, my guess is it relies on better
performance and more control.

I also saw that if I plan to use Data Stax enterprise to get real time
analytics, my data would need to be stored in Cassandra's usual format. It
would harder for me use PlayOrm if I am planning to use advanced data stax
features, like Solr indexing data on Cassandra without copying columns,
realtime, wouldn't it? I don't know much of this Solr feature yet, but my
understanding today is it wouldn't be aware of the tables I create with
playOrm, just of the column families this framework uses to store the data,
right?


2012/9/18 Hiller, Dean <Dean.Hiller@nrel.gov>

> Until Aaron replies, here are my thoughts on the relational piece=85
>
>            If everything in my model fits into a relational database, if
> my data is structured, would it still be a good idea to use Cassandra? Wh=
y?
>
> The playOrm project explores exactly this issue=85=85A query on 1,000,000=
 rows
> in a single partition only took 60ms AND you can do joins with it's S-SQL
> language.  The answer is a resounding YES, you can put relational data in
> cassandra.  The writes are way faster than a DBMS and joins and SQL can b=
e
> just as fast and in many cases FASTER on noSQL IF you partition your data
> properly.  A S-SQL statement looks like so on playOrm
>
> PARTITIONS t(:partitionId) SELECT t FROM Trades as t where t.numShares > =
10
>
> You can have as many partitions as you want and a single partition can
> have millions of rows though I would not exceed 10 million probably.
>
> Later,
> Dean
>
> 2012/9/18 aaron morton <aaron@thelastpickle.com<mailto:
> aaron@thelastpickle.com>>
> Also, I saw a presentation which said that if I don't have rows with more
> than a hundred rows in Cassandra, whether I am doing something wrong or I
> shouldn't be using Cassandra.
> I do not agree with that statement. (I read that as rows with ore than a
> hundred _columns_)
>
>
>  *   I need to support a high volume of writes per second. I might have a
> billion writes per hour
>
> Thats about 280K /sec. Netflix did a benchmark that shows 1.1M/sec
> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on=
.html
>
>
>  *   I need to write non-structured data that will be processed later by
> hadoop processes to generate structured data from it. Later, I index the
> structured data using SOLR or SOLANDRA, so the data can be consulted by m=
y
> end user application. Is Cassandra recommended for that, or should I be
> thinking in writting directly to HDFS files, for instance? What's the mai=
n
> advantage I get from storing data in a nosql service like Cassandra, when
> compared to storing files into HDFS?
>  *
>
> You can query your data using Hadoop easily enough. You may want take a
> look at DSE from  http://datastax.com/ it makes using Hadoop and Solr
> with cassandra easier.
>
>
>  *   If I don't need to perform complicated queries in Cassandra, should =
I
> store the json-like data just as a column value? I am afraid of doing
> something wrong here, as I would need just to store the json file and som=
e
> more 5 or 6 fields to query the files later.
>  *
>
> Store the data in the way that best supports the read queries you want to
> make. If you always read all the fields, or it's a canonical record of
> events storing as JSON may be best. If you often get a few fields, and
> maybe they are updated, storing each field as a column value may be best.
>
>
>  *   Does it make sense to you to use hadoop to process data from
> Cassandra and store the results in a database, like HBase? Once I have
> structured data, is there any reason I should use Cassandra instead of
> HBase?
>  *
>
> It depends on how many moving parts you are comfortable with. Same for th=
e
> questions about HDFS etc. Start with the smallest about of infrastructure=
.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/09/2012, at 10:28 AM, Marcelo Elias Del Valle <mvallebr@gmail.com
> <mailto:mvallebr@gmail.com>> wrote:
>
> Hello,
>
>      I am new to Cassandra and I am in doubt if Cassandra is the right
> technology to use in the architecture I am defining. Also, I saw a
> presentation which said that if I don't have rows with more than a hundre=
d
> rows in Cassandra, whether I am doing something wrong or I shouldn't be
> using Cassandra. Therefore, it might be the case I am doing something
> wrong. If you could help me to find out the answer for these questions by
> giving any feedback, it would be highly appreciated.
>      Here is my need and what I am thinking in using Cassandra for:
>
>  *   I need to support a high volume of writes per second. I might have a
> billion writes per hour
>  *   I need to write non-structured data that will be processed later by
> hadoop processes to generate structured data from it. Later, I index the
> structured data using SOLR or SOLANDRA, so the data can be consulted by m=
y
> end user application. Is Cassandra recommended for that, or should I be
> thinking in writting directly to HDFS files, for instance? What's the mai=
n
> advantage I get from storing data in a nosql service like Cassandra, when
> compared to storing files into HDFS?
>  *   Usually I will write json data associated to an ID and my hadoop
> processes will process this data to write data to a database. I have two
> doubts here:
>     *   If I don't need to perform complicated queries in Cassandra,
> should I store the json-like data just as a column value? I am afraid of
> doing something wrong here, as I would need just to store the json file a=
nd
> some more 5 or 6 fields to query the files later.
>     *   Does it make sense to you to use hadoop to process data from
> Cassandra and store the results in a database, like HBase? Once I have
> structured data, is there any reason I should use Cassandra instead of
> HBase?
>
>      I am sorry if the questions are too dummy, I have been watching a lo=
t
> of videos and reading a lot of documentation about Cassandra, but honestl=
y,
> more I read more I have questions.
>
> Thanks in advance.
>
> Best regards,
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>
>
>
>
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>


--=20
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--bcaec55554ea899a6804c9fcb16c
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

You&#39;re talking about this project, right?=A0
<a href=3D"https://github.com/deanhiller/playorm">https://github.com/deanhi=
ller/playorm</a>=A0<div>I will take a look. However, I don&#39;t think usin=
g Cassandra&#39;s model itself (with CFs / key-values) would be a problem, =
I just need to know where the advantage relies on. By your answer, my guess=
 is it relies on better performance and more control.</div>
<div><br></div><div>I also saw that if I plan to use Data Stax enterprise t=
o get real time analytics, my data would need to be stored in Cassandra&#39=
;s usual format. It would harder for me use PlayOrm if I am planning to use=
 advanced data stax features, like Solr indexing data on Cassandra without =
copying columns, realtime, wouldn&#39;t it? I don&#39;t know much of this S=
olr feature yet, but my understanding today is it wouldn&#39;t be aware of =
the tables I create with playOrm, just of the column families this framewor=
k uses to store the data, right?<br>
<br><br><br><br><div class=3D"gmail_quote">2012/9/18 Hiller, Dean <span dir=
=3D"ltr">&lt;<a href=3D"mailto:Dean.Hiller@nrel.gov" target=3D"_blank">Dean=
.Hiller@nrel.gov</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=
=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Until Aaron replies, here are my thoughts on the relational piece=85<br>
<div class=3D"im"><br>
=A0 =A0 =A0 =A0 =A0 =A0If everything in my model fits into a relational dat=
abase, if my data is structured, would it still be a good idea to use Cassa=
ndra? Why?<br>
<br>
</div>The playOrm project explores exactly this issue=85=85A query on 1,000=
,000 rows in a single partition only took 60ms AND you can do joins with it=
&#39;s S-SQL language. =A0The answer is a resounding YES, you can put relat=
ional data in cassandra. =A0The writes are way faster than a DBMS and joins=
 and SQL can be just as fast and in many cases FASTER on noSQL IF you parti=
tion your data properly. =A0A S-SQL statement looks like so on playOrm<br>

<br>
PARTITIONS t(:partitionId) SELECT t FROM Trades as t where t.numShares &gt;=
 10<br>
<br>
You can have as many partitions as you want and a single partition can have=
 millions of rows though I would not exceed 10 million probably.<br>
<br>
Later,<br>
Dean<br>
<br>
2012/9/18 aaron morton &lt;<a href=3D"mailto:aaron@thelastpickle.com">aaron=
@thelastpickle.com</a>&lt;mailto:<a href=3D"mailto:aaron@thelastpickle.com"=
>aaron@thelastpickle.com</a>&gt;&gt;<br>
<div class=3D"im">Also, I saw a presentation which said that if I don&#39;t=
 have rows with more than a hundred rows in Cassandra, whether I am doing s=
omething wrong or I shouldn&#39;t be using Cassandra.<br>
I do not agree with that statement. (I read that as rows with ore than a hu=
ndred _columns_)<br>
<br>
<br>
</div>=A0* =A0 I need to support a high volume of writes per second. I migh=
t have a billion writes per hour<br>
<div class=3D"im"><br>
Thats about 280K /sec. Netflix did a benchmark that shows 1.1M/sec <a href=
=3D"http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-=
on.html" target=3D"_blank">http://techblog.netflix.com/2011/11/benchmarking=
-cassandra-scalability-on.html</a><br>

<br>
<br>
</div>=A0* =A0 I need to write non-structured data that will be processed l=
ater by hadoop processes to generate structured data from it. Later, I inde=
x the structured data using SOLR or SOLANDRA, so the data can be consulted =
by my end user application. Is Cassandra recommended for that, or should I =
be thinking in writting directly to HDFS files, for instance? What&#39;s th=
e main advantage I get from storing data in a nosql service like Cassandra,=
 when compared to storing files into HDFS?<br>

<div class=3D"im">=A0*<br>
<br>
You can query your data using Hadoop easily enough. You may want take a loo=
k at DSE from =A0<a href=3D"http://datastax.com/" target=3D"_blank">http://=
datastax.com/</a> it makes using Hadoop and Solr with cassandra easier.<br>
<br>
<br>
</div>=A0* =A0 If I don&#39;t need to perform complicated queries in Cassan=
dra, should I store the json-like data just as a column value? I am afraid =
of doing something wrong here, as I would need just to store the json file =
and some more 5 or 6 fields to query the files later.<br>

<div class=3D"im">=A0*<br>
<br>
Store the data in the way that best supports the read queries you want to m=
ake. If you always read all the fields, or it&#39;s a canonical record of e=
vents storing as JSON may be best. If you often get a few fields, and maybe=
 they are updated, storing each field as a column value may be best.<br>

<br>
<br>
</div>=A0* =A0 Does it make sense to you to use hadoop to process data from=
 Cassandra and store the results in a database, like HBase? Once I have str=
uctured data, is there any reason I should use Cassandra instead of HBase?<=
br>

<div class=3D"im">=A0*<br>
<br>
It depends on how many moving parts you are comfortable with. Same for the =
questions about HDFS etc. Start with the smallest about of infrastructure.<=
br>
<br>
Hope that helps.<br>
<br>
-----------------<br>
Aaron Morton<br>
Freelance Developer<br>
@aaronmorton<br>
<a href=3D"http://www.thelastpickle.com" target=3D"_blank">http://www.thela=
stpickle.com</a><br>
<br>
</div><div class=3D"im">On 18/09/2012, at 10:28 AM, Marcelo Elias Del Valle=
 &lt;<a href=3D"mailto:mvallebr@gmail.com">mvallebr@gmail.com</a>&lt;mailto=
:<a href=3D"mailto:mvallebr@gmail.com">mvallebr@gmail.com</a>&gt;&gt; wrote=
:<br>

<br>
Hello,<br>
<br>
=A0 =A0 =A0I am new to Cassandra and I am in doubt if Cassandra is the righ=
t technology to use in the architecture I am defining. Also, I saw a presen=
tation which said that if I don&#39;t have rows with more than a hundred ro=
ws in Cassandra, whether I am doing something wrong or I shouldn&#39;t be u=
sing Cassandra. Therefore, it might be the case I am doing something wrong.=
 If you could help me to find out the answer for these questions by giving =
any feedback, it would be highly appreciated.<br>

=A0 =A0 =A0Here is my need and what I am thinking in using Cassandra for:<b=
r>
<br>
</div>=A0* =A0 I need to support a high volume of writes per second. I migh=
t have a billion writes per hour<br>
=A0* =A0 I need to write non-structured data that will be processed later b=
y hadoop processes to generate structured data from it. Later, I index the =
structured data using SOLR or SOLANDRA, so the data can be consulted by my =
end user application. Is Cassandra recommended for that, or should I be thi=
nking in writting directly to HDFS files, for instance? What&#39;s the main=
 advantage I get from storing data in a nosql service like Cassandra, when =
compared to storing files into HDFS?<br>

=A0* =A0 Usually I will write json data associated to an ID and my hadoop p=
rocesses will process this data to write data to a database. I have two dou=
bts here:<br>
=A0 =A0 * =A0 If I don&#39;t need to perform complicated queries in Cassand=
ra, should I store the json-like data just as a column value? I am afraid o=
f doing something wrong here, as I would need just to store the json file a=
nd some more 5 or 6 fields to query the files later.<br>

=A0 =A0 * =A0 Does it make sense to you to use hadoop to process data from =
Cassandra and store the results in a database, like HBase? Once I have stru=
ctured data, is there any reason I should use Cassandra instead of HBase?<b=
r>

<div class=3D"HOEnZb"><div class=3D"h5"><br>
=A0 =A0 =A0I am sorry if the questions are too dummy, I have been watching =
a lot of videos and reading a lot of documentation about Cassandra, but hon=
estly, more I read more I have questions.<br>
<br>
Thanks in advance.<br>
<br>
Best regards,<br>
--<br>
Marcelo Elias Del Valle<br>
<a href=3D"http://mvalle.com" target=3D"_blank">http://mvalle.com</a> - @mv=
allebr<br>
<br>
<br>
<br>
<br>
--<br>
Marcelo Elias Del Valle<br>
<a href=3D"http://mvalle.com" target=3D"_blank">http://mvalle.com</a> - @mv=
allebr<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Marcelo Elias Del Valle<br><a href=3D"http://mvalle.com" target=3D"_blank">=
http://mvalle.com</a>=A0- @mvallebr<br>
</div>

--bcaec55554ea899a6804c9fcb16c--