Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E5F1CDEB1 for ; Tue, 18 Sep 2012 13:53:14 +0000 (UTC) Received: (qmail 58288 invoked by uid 500); 18 Sep 2012 13:53:12 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 58253 invoked by uid 500); 18 Sep 2012 13:53:12 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 58245 invoked by uid 99); 18 Sep 2012 13:53:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 13:53:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mvallebr@gmail.com designates 209.85.217.172 as permitted sender) Received: from [209.85.217.172] (HELO mail-lb0-f172.google.com) (209.85.217.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 13:53:07 +0000 Received: by lbky2 with SMTP id y2so12027lbk.31 for ; Tue, 18 Sep 2012 06:52:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=hvjv3ZvHMfx2siecW2XuFyHaW1bTKpmjQuOY5w0OdEw=; b=fQlgQYblr5hl3HG23X4pjWaicFzDJg1hOsAD2rfH1pDT/3f4x6lz3sy2r2Fh5wO33E 3YfYN9PqkwI9Mwft4x0m1m7CeZDHZaikOXOEm3u6xbKerx92ZEztdLrWwDHa1ayvyL3Q speP77Jea47qPPTWWhteyMAZlodiTjZVX+Yct4DDdVyc2p5wBj5Ks0gzcnylRukQCaqR YvPee/X8W4alZL9zNO7GySmoRl/l50Bm00YKdvT3GlFy4h0QQpfELMPHxxM/Bx0oSyOC Qiw0nsav4M5Y+qa0U8j5u1rFREY8RQ/ldEot1BGma3af3zh7WH94S0mVfv62VZwOySl9 017A== MIME-Version: 1.0 Received: by 10.152.131.68 with SMTP id ok4mr135013lab.47.1347976365695; Tue, 18 Sep 2012 06:52:45 -0700 (PDT) Received: by 10.112.21.132 with HTTP; Tue, 18 Sep 2012 06:52:45 -0700 (PDT) In-Reply-To: References: Date: Tue, 18 Sep 2012 10:52:45 -0300 Message-ID: Subject: Re: Is Cassandra right for me? From: Marcelo Elias Del Valle To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d0435c1d271625604c9fa3449 X-Virus-Checked: Checked by ClamAV on apache.org --f46d0435c1d271625604c9fa3449 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable I will have just 6 columns in my CF, but I will have about a billion writes per hour. In this case, I think Cassandra applies then, by what you are saying. This answer helped a lot too, thanks! 2012/9/18 Hiller, Dean > I wanted to clarify the where that statement comes from on wide rows =85. > > Realize some people make the claim that if you don=92t' have 1000's of > columns in "some" rows in cassandra you are doing something wrong. This = is > not true, BUT it comes from the fact that people are setting up indexes. > This is what leads to the very wide row affect. playOrm is one such > library using wide rows like this BUT it is NOT necessary for all > applications. > > You can easily use map/reduce on a cassandra cluster. You can map/reduce > your dataset into a new model if you make a mistake as well and don't get > it right the first time. This wide row affect is 80% of the time used fo= r > indexing. I draw off playOrm examples a lot but one table may be > partitioned by time so each month of data is in a partition, you can then > have indexes on each partition allowing you to do quick queries into > partitions. > > Later, > Dean > > From: Marcelo Elias Del Valle mvallebr@gmail.com>> > Reply-To: "user@cassandra.apache.org" < > user@cassandra.apache.org> > Date: Monday, September 17, 2012 4:28 PM > To: "user@cassandra.apache.org" < > user@cassandra.apache.org> > Subject: Is Cassandra right for me? > > Hello, > > I am new to Cassandra and I am in doubt if Cassandra is the right > technology to use in the architecture I am defining. Also, I saw a > presentation which said that if I don't have rows with more than a hundre= d > rows in Cassandra, whether I am doing something wrong or I shouldn't be > using Cassandra. Therefore, it might be the case I am doing something > wrong. If you could help me to find out the answer for these questions by > giving any feedback, it would be highly appreciated. > Here is my need and what I am thinking in using Cassandra for: > > * I need to support a high volume of writes per second. I might have a > billion writes per hour > * I need to write non-structured data that will be processed later by > hadoop processes to generate structured data from it. Later, I index the > structured data using SOLR or SOLANDRA, so the data can be consulted by m= y > end user application. Is Cassandra recommended for that, or should I be > thinking in writting directly to HDFS files, for instance? What's the mai= n > advantage I get from storing data in a nosql service like Cassandra, when > compared to storing files into HDFS? > * Usually I will write json data associated to an ID and my hadoop > processes will process this data to write data to a database. I have two > doubts here: > * If I don't need to perform complicated queries in Cassandra, > should I store the json-like data just as a column value? I am afraid of > doing something wrong here, as I would need just to store the json file a= nd > some more 5 or 6 fields to query the files later. > * Does it make sense to you to use hadoop to process data from > Cassandra and store the results in a database, like HBase? Once I have > structured data, is there any reason I should use Cassandra instead of > HBase? > > I am sorry if the questions are too dummy, I have been watching a lo= t > of videos and reading a lot of documentation about Cassandra, but honestl= y, > more I read more I have questions. > > Thanks in advance. > > Best regards, > -- > Marcelo Elias Del Valle > http://mvalle.com - @mvallebr > --=20 Marcelo Elias Del Valle http://mvalle.com - @mvallebr --f46d0435c1d271625604c9fa3449 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
I will have just 6 columns in my CF, but I will have about a billi= on writes per hour. In this case, I think Cassandra applies then, by what y= ou are saying.
This answer helped a lot too, thanks!=A0

2012/9/18 Hiller, Dean <Dean.Hiller@nrel.gov>
I wanted to clarify the where that statement comes from on wide rows =85.
Realize some people make the claim that if you don=92t' have 1000's= of columns in "some" rows in cassandra you are doing something w= rong. =A0This is not true, BUT it comes from the fact that people are setti= ng up indexes. =A0This is what leads to the very wide row affect. =A0playOr= m is one such library using wide rows like this BUT it is NOT necessary for= all applications.

You can easily use map/reduce on a cassandra cluster. =A0You can map/reduce= your dataset into a new model if you make a mistake as well and don't = get it right the first time. =A0This wide row affect is 80% of the time use= d for indexing. =A0I draw off playOrm examples a lot but one table may be p= artitioned by time so each month of data is in a partition, you can then ha= ve indexes on each partition allowing you to do quick queries into partitio= ns.

Later,
Dean

From: Marcelo Elias Del Valle <mva= llebr@gmail.com<mailto:mvalleb= r@gmail.com>>
Reply-To: "user@cassandra= .apache.org<mailto:user= @cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, September 17, 2012 4:28 PM
To: "user@cassandra.apach= e.org<mailto:user@cassa= ndra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Is Cassandra right for me?

Hello,

=A0 =A0 =A0I am new to Cassandra and I am in doubt if Cassandra is the righ= t technology to use in the architecture I am defining. Also, I saw a presen= tation which said that if I don't have rows with more than a hundred ro= ws in Cassandra, whether I am doing something wrong or I shouldn't be u= sing Cassandra. Therefore, it might be the case I am doing something wrong.= If you could help me to find out the answer for these questions by giving = any feedback, it would be highly appreciated.
=A0 =A0 =A0Here is my need and what I am thinking in using Cassandra for:
=A0* =A0 I need to support a high volume of writes per second. I migh= t have a billion writes per hour
=A0* =A0 I need to write non-structured data that will be processed later b= y hadoop processes to generate structured data from it. Later, I index the = structured data using SOLR or SOLANDRA, so the data can be consulted by my = end user application. Is Cassandra recommended for that, or should I be thi= nking in writting directly to HDFS files, for instance? What's the main= advantage I get from storing data in a nosql service like Cassandra, when = compared to storing files into HDFS?
=A0* =A0 Usually I will write json data associated to an ID and my hadoop p= rocesses will process this data to write data to a database. I have two dou= bts here:
=A0 =A0 * =A0 If I don't need to perform complicated queries in Cassand= ra, should I store the json-like data just as a column value? I am afraid o= f doing something wrong here, as I would need just to store the json file a= nd some more 5 or 6 fields to query the files later.
=A0 =A0 * =A0 Does it make sense to you to use hadoop to process data from = Cassandra and store the results in a database, like HBase? Once I have stru= ctured data, is there any reason I should use Cassandra instead of HBase?

=A0 =A0 =A0I am sorry if the questions are too dummy, I have been watching = a lot of videos and reading a lot of documentation about Cassandra, but hon= estly, more I read more I have questions.

Thanks in advance.

Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mv= allebr



--
= Marcelo Elias Del Valle
= http://mvalle.com=A0- @mvallebr
--f46d0435c1d271625604c9fa3449--