Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D84ED29C for ; Tue, 18 Sep 2012 16:51:23 +0000 (UTC) Received: (qmail 94681 invoked by uid 500); 18 Sep 2012 16:51:20 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 94653 invoked by uid 500); 18 Sep 2012 16:51:20 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 94645 invoked by uid 99); 18 Sep 2012 16:51:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 16:51:20 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mvallebr@gmail.com designates 209.85.215.44 as permitted sender) Received: from [209.85.215.44] (HELO mail-lpp01m010-f44.google.com) (209.85.215.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Sep 2012 16:51:15 +0000 Received: by lahm15 with SMTP id m15so39893lah.31 for ; Tue, 18 Sep 2012 09:50:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=u5TQsRg4aQfUQHs1d/wobDxXpUQFvKBl0eqehqriwN0=; b=F/Zlgl3W+H4CMdRJOwFiWyLQaKcCnoiwsjzsaaZVUV92jRVfyoL3Yr0N35QVc6kESn kKCYYwB+m7E0DkJwfnPLr1hqTwx6u5GEpsgnOz0mzQmRgnFCokrv14G159U7jPz2Xf4v 22FMMpVkgakkBsPn4jZhk+yDt0EiaM8Az4NdoJgZZK3DzTVxAYJZjUOzQx3Triu87xbT K77i78GfBCNwQcqQ0Fw8FpzQJ7Be+kjO7Leb0aS43XH8lvUaoJ2K/w334QhZMz5nFu8V 5QeG6KTG4S+v/fFePNoOIFT5yRGhkHdbnJU1l2CsPJFfBaFcapwplX5i7V0sjvMLH6BQ cSNA== MIME-Version: 1.0 Received: by 10.112.31.231 with SMTP id d7mr131325lbi.60.1347987054369; Tue, 18 Sep 2012 09:50:54 -0700 (PDT) Received: by 10.112.21.132 with HTTP; Tue, 18 Sep 2012 09:50:54 -0700 (PDT) In-Reply-To: References: Date: Tue, 18 Sep 2012 13:50:54 -0300 Message-ID: Subject: Re: Is Cassandra right for me? From: Marcelo Elias Del Valle To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=bcaec55554ea899a6804c9fcb16c --bcaec55554ea899a6804c9fcb16c Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable You're talking about this project, right? https://github.com/deanhiller/playorm I will take a look. However, I don't think using Cassandra's model itself (with CFs / key-values) would be a problem, I just need to know where the advantage relies on. By your answer, my guess is it relies on better performance and more control. I also saw that if I plan to use Data Stax enterprise to get real time analytics, my data would need to be stored in Cassandra's usual format. It would harder for me use PlayOrm if I am planning to use advanced data stax features, like Solr indexing data on Cassandra without copying columns, realtime, wouldn't it? I don't know much of this Solr feature yet, but my understanding today is it wouldn't be aware of the tables I create with playOrm, just of the column families this framework uses to store the data, right? 2012/9/18 Hiller, Dean > Until Aaron replies, here are my thoughts on the relational piece=85 > > If everything in my model fits into a relational database, if > my data is structured, would it still be a good idea to use Cassandra? Wh= y? > > The playOrm project explores exactly this issue=85=85A query on 1,000,000= rows > in a single partition only took 60ms AND you can do joins with it's S-SQL > language. The answer is a resounding YES, you can put relational data in > cassandra. The writes are way faster than a DBMS and joins and SQL can b= e > just as fast and in many cases FASTER on noSQL IF you partition your data > properly. A S-SQL statement looks like so on playOrm > > PARTITIONS t(:partitionId) SELECT t FROM Trades as t where t.numShares > = 10 > > You can have as many partitions as you want and a single partition can > have millions of rows though I would not exceed 10 million probably. > > Later, > Dean > > 2012/9/18 aaron morton aaron@thelastpickle.com>> > Also, I saw a presentation which said that if I don't have rows with more > than a hundred rows in Cassandra, whether I am doing something wrong or I > shouldn't be using Cassandra. > I do not agree with that statement. (I read that as rows with ore than a > hundred _columns_) > > > * I need to support a high volume of writes per second. I might have a > billion writes per hour > > Thats about 280K /sec. Netflix did a benchmark that shows 1.1M/sec > http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on= .html > > > * I need to write non-structured data that will be processed later by > hadoop processes to generate structured data from it. Later, I index the > structured data using SOLR or SOLANDRA, so the data can be consulted by m= y > end user application. Is Cassandra recommended for that, or should I be > thinking in writting directly to HDFS files, for instance? What's the mai= n > advantage I get from storing data in a nosql service like Cassandra, when > compared to storing files into HDFS? > * > > You can query your data using Hadoop easily enough. You may want take a > look at DSE from http://datastax.com/ it makes using Hadoop and Solr > with cassandra easier. > > > * If I don't need to perform complicated queries in Cassandra, should = I > store the json-like data just as a column value? I am afraid of doing > something wrong here, as I would need just to store the json file and som= e > more 5 or 6 fields to query the files later. > * > > Store the data in the way that best supports the read queries you want to > make. If you always read all the fields, or it's a canonical record of > events storing as JSON may be best. If you often get a few fields, and > maybe they are updated, storing each field as a column value may be best. > > > * Does it make sense to you to use hadoop to process data from > Cassandra and store the results in a database, like HBase? Once I have > structured data, is there any reason I should use Cassandra instead of > HBase? > * > > It depends on how many moving parts you are comfortable with. Same for th= e > questions about HDFS etc. Start with the smallest about of infrastructure= . > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 18/09/2012, at 10:28 AM, Marcelo Elias Del Valle > wrote: > > Hello, > > I am new to Cassandra and I am in doubt if Cassandra is the right > technology to use in the architecture I am defining. Also, I saw a > presentation which said that if I don't have rows with more than a hundre= d > rows in Cassandra, whether I am doing something wrong or I shouldn't be > using Cassandra. Therefore, it might be the case I am doing something > wrong. If you could help me to find out the answer for these questions by > giving any feedback, it would be highly appreciated. > Here is my need and what I am thinking in using Cassandra for: > > * I need to support a high volume of writes per second. I might have a > billion writes per hour > * I need to write non-structured data that will be processed later by > hadoop processes to generate structured data from it. Later, I index the > structured data using SOLR or SOLANDRA, so the data can be consulted by m= y > end user application. Is Cassandra recommended for that, or should I be > thinking in writting directly to HDFS files, for instance? What's the mai= n > advantage I get from storing data in a nosql service like Cassandra, when > compared to storing files into HDFS? > * Usually I will write json data associated to an ID and my hadoop > processes will process this data to write data to a database. I have two > doubts here: > * If I don't need to perform complicated queries in Cassandra, > should I store the json-like data just as a column value? I am afraid of > doing something wrong here, as I would need just to store the json file a= nd > some more 5 or 6 fields to query the files later. > * Does it make sense to you to use hadoop to process data from > Cassandra and store the results in a database, like HBase? Once I have > structured data, is there any reason I should use Cassandra instead of > HBase? > > I am sorry if the questions are too dummy, I have been watching a lo= t > of videos and reading a lot of documentation about Cassandra, but honestl= y, > more I read more I have questions. > > Thanks in advance. > > Best regards, > -- > Marcelo Elias Del Valle > http://mvalle.com - @mvallebr > > > > > -- > Marcelo Elias Del Valle > http://mvalle.com - @mvallebr > --=20 Marcelo Elias Del Valle http://mvalle.com - @mvallebr --bcaec55554ea899a6804c9fcb16c Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable You're talking about this project, right?=A0 https://github.com/deanhi= ller/playorm=A0
I will take a look. However, I don't think usin= g Cassandra's model itself (with CFs / key-values) would be a problem, = I just need to know where the advantage relies on. By your answer, my guess= is it relies on better performance and more control.

I also saw that if I plan to use Data Stax enterprise t= o get real time analytics, my data would need to be stored in Cassandra'= ;s usual format. It would harder for me use PlayOrm if I am planning to use= advanced data stax features, like Solr indexing data on Cassandra without = copying columns, realtime, wouldn't it? I don't know much of this S= olr feature yet, but my understanding today is it wouldn't be aware of = the tables I create with playOrm, just of the column families this framewor= k uses to store the data, right?




2012/9/18 Hiller, Dean <Dean= .Hiller@nrel.gov>
Until Aaron replies, here are my thoughts on the relational piece=85

=A0 =A0 =A0 =A0 =A0 =A0If everything in my model fits into a relational dat= abase, if my data is structured, would it still be a good idea to use Cassa= ndra? Why?

The playOrm project explores exactly this issue=85=85A query on 1,000= ,000 rows in a single partition only took 60ms AND you can do joins with it= 's S-SQL language. =A0The answer is a resounding YES, you can put relat= ional data in cassandra. =A0The writes are way faster than a DBMS and joins= and SQL can be just as fast and in many cases FASTER on noSQL IF you parti= tion your data properly. =A0A S-SQL statement looks like so on playOrm

PARTITIONS t(:partitionId) SELECT t FROM Trades as t where t.numShares >= 10

You can have as many partitions as you want and a single partition can have= millions of rows though I would not exceed 10 million probably.

Later,
Dean

2012/9/18 aaron morton <aaron= @thelastpickle.com<mailto:aaron@thelastpickle.com>>
Also, I saw a presentation which said that if I don't= have rows with more than a hundred rows in Cassandra, whether I am doing s= omething wrong or I shouldn't be using Cassandra.
I do not agree with that statement. (I read that as rows with ore than a hu= ndred _columns_)


=A0* =A0 I need to support a high volume of writes per second. I migh= t have a billion writes per hour

Thats about 280K /sec. Netflix did a benchmark that shows 1.1M/sec http://techblog.netflix.com/2011/11/benchmarking= -cassandra-scalability-on.html


=A0* =A0 I need to write non-structured data that will be processed l= ater by hadoop processes to generate structured data from it. Later, I inde= x the structured data using SOLR or SOLANDRA, so the data can be consulted = by my end user application. Is Cassandra recommended for that, or should I = be thinking in writting directly to HDFS files, for instance? What's th= e main advantage I get from storing data in a nosql service like Cassandra,= when compared to storing files into HDFS?
=A0*

You can query your data using Hadoop easily enough. You may want take a loo= k at DSE from =A0http://= datastax.com/ it makes using Hadoop and Solr with cassandra easier.


=A0* =A0 If I don't need to perform complicated queries in Cassan= dra, should I store the json-like data just as a column value? I am afraid = of doing something wrong here, as I would need just to store the json file = and some more 5 or 6 fields to query the files later.
=A0*

Store the data in the way that best supports the read queries you want to m= ake. If you always read all the fields, or it's a canonical record of e= vents storing as JSON may be best. If you often get a few fields, and maybe= they are updated, storing each field as a column value may be best.


=A0* =A0 Does it make sense to you to use hadoop to process data from= Cassandra and store the results in a database, like HBase? Once I have str= uctured data, is there any reason I should use Cassandra instead of HBase?<= br>
=A0*

It depends on how many moving parts you are comfortable with. Same for the = questions about HDFS etc. Start with the smallest about of infrastructure.<= br>
Hope that helps.

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thela= stpickle.com

On 18/09/2012, at 10:28 AM, Marcelo Elias Del Valle= <mvallebr@gmail.com<mailto= :mvallebr@gmail.com>> wrote= :

Hello,

=A0 =A0 =A0I am new to Cassandra and I am in doubt if Cassandra is the righ= t technology to use in the architecture I am defining. Also, I saw a presen= tation which said that if I don't have rows with more than a hundred ro= ws in Cassandra, whether I am doing something wrong or I shouldn't be u= sing Cassandra. Therefore, it might be the case I am doing something wrong.= If you could help me to find out the answer for these questions by giving = any feedback, it would be highly appreciated.
=A0 =A0 =A0Here is my need and what I am thinking in using Cassandra for:
=A0* =A0 I need to support a high volume of writes per second. I migh= t have a billion writes per hour
=A0* =A0 I need to write non-structured data that will be processed later b= y hadoop processes to generate structured data from it. Later, I index the = structured data using SOLR or SOLANDRA, so the data can be consulted by my = end user application. Is Cassandra recommended for that, or should I be thi= nking in writting directly to HDFS files, for instance? What's the main= advantage I get from storing data in a nosql service like Cassandra, when = compared to storing files into HDFS?
=A0* =A0 Usually I will write json data associated to an ID and my hadoop p= rocesses will process this data to write data to a database. I have two dou= bts here:
=A0 =A0 * =A0 If I don't need to perform complicated queries in Cassand= ra, should I store the json-like data just as a column value? I am afraid o= f doing something wrong here, as I would need just to store the json file a= nd some more 5 or 6 fields to query the files later.
=A0 =A0 * =A0 Does it make sense to you to use hadoop to process data from = Cassandra and store the results in a database, like HBase? Once I have stru= ctured data, is there any reason I should use Cassandra instead of HBase?

=A0 =A0 =A0I am sorry if the questions are too dummy, I have been watching = a lot of videos and reading a lot of documentation about Cassandra, but hon= estly, more I read more I have questions.

Thanks in advance.

Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mv= allebr




--
Marcelo Elias Del Valle
http://mvalle.com - @mv= allebr



--
= Marcelo Elias Del Valle
= http://mvalle.com=A0- @mvallebr
--bcaec55554ea899a6804c9fcb16c--