Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of mvallebr@gmail.com designates
 209.85.217.172 as permitted sender)
MIME-Version: 1.0
In-Reply-To: <CC7DCC54.10D41%Dean.Hiller@nrel.gov>
References: 
 <CABKQiduPKNuZzZD1RNQmJswah9iz+0ha2Z1w1dOQSgTyproTEg@mail.gmail.com>
	<CC7DCC54.10D41%Dean.Hiller@nrel.gov>
Date: Tue, 18 Sep 2012 10:52:45 -0300
Message-ID: 
 <CABKQidu+NQG3tWovL_=P0Pf0MRN6yLgNpUjr=vGt8H=ksz=76Q@mail.gmail.com>
Subject: Re: Is Cassandra right for me?
From: Marcelo Elias Del Valle <mvallebr@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=f46d0435c1d271625604c9fa3449

--f46d0435c1d271625604c9fa3449
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

I will have just 6 columns in my CF, but I will have about a billion writes
per hour. In this case, I think Cassandra applies then, by what you are
saying.
This answer helped a lot too, thanks!

2012/9/18 Hiller, Dean <Dean.Hiller@nrel.gov>

> I wanted to clarify the where that statement comes from on wide rows =85.
>
> Realize some people make the claim that if you don=92t' have 1000's of
> columns in "some" rows in cassandra you are doing something wrong.  This =
is
> not true, BUT it comes from the fact that people are setting up indexes.
>  This is what leads to the very wide row affect.  playOrm is one such
> library using wide rows like this BUT it is NOT necessary for all
> applications.
>
> You can easily use map/reduce on a cassandra cluster.  You can map/reduce
> your dataset into a new model if you make a mistake as well and don't get
> it right the first time.  This wide row affect is 80% of the time used fo=
r
> indexing.  I draw off playOrm examples a lot but one table may be
> partitioned by time so each month of data is in a partition, you can then
> have indexes on each partition allowing you to do quick queries into
> partitions.
>
> Later,
> Dean
>
> From: Marcelo Elias Del Valle <mvallebr@gmail.com<mailto:
> mvallebr@gmail.com>>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Monday, September 17, 2012 4:28 PM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Subject: Is Cassandra right for me?
>
> Hello,
>
>      I am new to Cassandra and I am in doubt if Cassandra is the right
> technology to use in the architecture I am defining. Also, I saw a
> presentation which said that if I don't have rows with more than a hundre=
d
> rows in Cassandra, whether I am doing something wrong or I shouldn't be
> using Cassandra. Therefore, it might be the case I am doing something
> wrong. If you could help me to find out the answer for these questions by
> giving any feedback, it would be highly appreciated.
>      Here is my need and what I am thinking in using Cassandra for:
>
>  *   I need to support a high volume of writes per second. I might have a
> billion writes per hour
>  *   I need to write non-structured data that will be processed later by
> hadoop processes to generate structured data from it. Later, I index the
> structured data using SOLR or SOLANDRA, so the data can be consulted by m=
y
> end user application. Is Cassandra recommended for that, or should I be
> thinking in writting directly to HDFS files, for instance? What's the mai=
n
> advantage I get from storing data in a nosql service like Cassandra, when
> compared to storing files into HDFS?
>  *   Usually I will write json data associated to an ID and my hadoop
> processes will process this data to write data to a database. I have two
> doubts here:
>     *   If I don't need to perform complicated queries in Cassandra,
> should I store the json-like data just as a column value? I am afraid of
> doing something wrong here, as I would need just to store the json file a=
nd
> some more 5 or 6 fields to query the files later.
>     *   Does it make sense to you to use hadoop to process data from
> Cassandra and store the results in a database, like HBase? Once I have
> structured data, is there any reason I should use Cassandra instead of
> HBase?
>
>      I am sorry if the questions are too dummy, I have been watching a lo=
t
> of videos and reading a lot of documentation about Cassandra, but honestl=
y,
> more I read more I have questions.
>
> Thanks in advance.
>
> Best regards,
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>


--=20
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr

--f46d0435c1d271625604c9fa3449
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<br><div>I will have just 6 columns in my CF, but I will have about a billi=
on writes per hour. In this case, I think Cassandra applies then, by what y=
ou are saying.</div><div>This answer helped a lot too, thanks!=A0<br><br><d=
iv class=3D"gmail_quote">
2012/9/18 Hiller, Dean <span dir=3D"ltr">&lt;<a href=3D"mailto:Dean.Hiller@=
nrel.gov" target=3D"_blank">Dean.Hiller@nrel.gov</a>&gt;</span><br><blockqu=
ote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc s=
olid;padding-left:1ex">
I wanted to clarify the where that statement comes from on wide rows =85.<b=
r>
<br>
Realize some people make the claim that if you don=92t&#39; have 1000&#39;s=
 of columns in &quot;some&quot; rows in cassandra you are doing something w=
rong. =A0This is not true, BUT it comes from the fact that people are setti=
ng up indexes. =A0This is what leads to the very wide row affect. =A0playOr=
m is one such library using wide rows like this BUT it is NOT necessary for=
 all applications.<br>

<br>
You can easily use map/reduce on a cassandra cluster. =A0You can map/reduce=
 your dataset into a new model if you make a mistake as well and don&#39;t =
get it right the first time. =A0This wide row affect is 80% of the time use=
d for indexing. =A0I draw off playOrm examples a lot but one table may be p=
artitioned by time so each month of data is in a partition, you can then ha=
ve indexes on each partition allowing you to do quick queries into partitio=
ns.<br>

<br>
Later,<br>
Dean<br>
<br>
From: Marcelo Elias Del Valle &lt;<a href=3D"mailto:mvallebr@gmail.com">mva=
llebr@gmail.com</a>&lt;mailto:<a href=3D"mailto:mvallebr@gmail.com">mvalleb=
r@gmail.com</a>&gt;&gt;<br>
Reply-To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra=
.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user=
@cassandra.apache.org</a>&gt;&quot; &lt;<a href=3D"mailto:user@cassandra.ap=
ache.org">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@ca=
ssandra.apache.org">user@cassandra.apache.org</a>&gt;&gt;<br>

Date: Monday, September 17, 2012 4:28 PM<br>
To: &quot;<a href=3D"mailto:user@cassandra.apache.org">user@cassandra.apach=
e.org</a>&lt;mailto:<a href=3D"mailto:user@cassandra.apache.org">user@cassa=
ndra.apache.org</a>&gt;&quot; &lt;<a href=3D"mailto:user@cassandra.apache.o=
rg">user@cassandra.apache.org</a>&lt;mailto:<a href=3D"mailto:user@cassandr=
a.apache.org">user@cassandra.apache.org</a>&gt;&gt;<br>

<div class=3D"im">Subject: Is Cassandra right for me?<br>
<br>
</div><div class=3D"im">Hello,<br>
<br>
=A0 =A0 =A0I am new to Cassandra and I am in doubt if Cassandra is the righ=
t technology to use in the architecture I am defining. Also, I saw a presen=
tation which said that if I don&#39;t have rows with more than a hundred ro=
ws in Cassandra, whether I am doing something wrong or I shouldn&#39;t be u=
sing Cassandra. Therefore, it might be the case I am doing something wrong.=
 If you could help me to find out the answer for these questions by giving =
any feedback, it would be highly appreciated.<br>

=A0 =A0 =A0Here is my need and what I am thinking in using Cassandra for:<b=
r>
<br>
</div>=A0* =A0 I need to support a high volume of writes per second. I migh=
t have a billion writes per hour<br>
=A0* =A0 I need to write non-structured data that will be processed later b=
y hadoop processes to generate structured data from it. Later, I index the =
structured data using SOLR or SOLANDRA, so the data can be consulted by my =
end user application. Is Cassandra recommended for that, or should I be thi=
nking in writting directly to HDFS files, for instance? What&#39;s the main=
 advantage I get from storing data in a nosql service like Cassandra, when =
compared to storing files into HDFS?<br>

=A0* =A0 Usually I will write json data associated to an ID and my hadoop p=
rocesses will process this data to write data to a database. I have two dou=
bts here:<br>
=A0 =A0 * =A0 If I don&#39;t need to perform complicated queries in Cassand=
ra, should I store the json-like data just as a column value? I am afraid o=
f doing something wrong here, as I would need just to store the json file a=
nd some more 5 or 6 fields to query the files later.<br>

=A0 =A0 * =A0 Does it make sense to you to use hadoop to process data from =
Cassandra and store the results in a database, like HBase? Once I have stru=
ctured data, is there any reason I should use Cassandra instead of HBase?<b=
r>

<div class=3D"HOEnZb"><div class=3D"h5"><br>
=A0 =A0 =A0I am sorry if the questions are too dummy, I have been watching =
a lot of videos and reading a lot of documentation about Cassandra, but hon=
estly, more I read more I have questions.<br>
<br>
Thanks in advance.<br>
<br>
Best regards,<br>
--<br>
Marcelo Elias Del Valle<br>
<a href=3D"http://mvalle.com" target=3D"_blank">http://mvalle.com</a> - @mv=
allebr<br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
Marcelo Elias Del Valle<br><a href=3D"http://mvalle.com" target=3D"_blank">=
http://mvalle.com</a>=A0- @mvallebr<br>
</div>

--f46d0435c1d271625604c9fa3449--