Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EFBE11069E for ; Fri, 26 Apr 2013 16:59:31 +0000 (UTC) Received: (qmail 14659 invoked by uid 500); 26 Apr 2013 16:59:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 14631 invoked by uid 500); 26 Apr 2013 16:59:29 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 14623 invoked by uid 99); 26 Apr 2013 16:59:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 16:59:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of teufel.marc@googlemail.com designates 209.85.214.175 as permitted sender) Received: from [209.85.214.175] (HELO mail-ob0-f175.google.com) (209.85.214.175) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Apr 2013 16:59:23 +0000 Received: by mail-ob0-f175.google.com with SMTP id wp18so3786542obc.34 for ; Fri, 26 Apr 2013 09:59:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=oO1DZ4yo2+xtPelqeufeAB6V8ruj4rZauu3Un6R3thw=; b=MFEQ9rmhXpRuohhtW+2sjtiFGOhPYA/Evn4rU4trsOmFv9PVfAbhHFvWuXlmGDVzq0 P+H/PBIpNnO6V3R11+7d+JjqN9KBwgcV5QwtzZURpux0PwrufXUn3L6N1gt9blD+TrT6 G7c5eXCsTGyIbdFAVnpPePDeWXgn/1N4ex8ZzAfV+v022OeyVr0XP0ehdzOHXVwkJxXJ w8aHukelhl4vGZiZUC+c+TBn7iJlnGPwRt+GBq+hL5219y4Db/VIRWUul81F1QJbFkHO aaMEJya+VghGwKpJiMrHNmESNQimRjQ8Q8qRM+y7btvAO8r96Etrv1HDg397PJsVeH0B curg== MIME-Version: 1.0 X-Received: by 10.60.28.37 with SMTP id y5mr19340502oeg.134.1366995542581; Fri, 26 Apr 2013 09:59:02 -0700 (PDT) Received: by 10.60.39.69 with HTTP; Fri, 26 Apr 2013 09:59:02 -0700 (PDT) In-Reply-To: References: Date: Fri, 26 Apr 2013 18:59:02 +0200 Message-ID: Subject: Re: Is Cassandra oversized for this kind of use case? From: Marc Teufel To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=e89a8fb1f7f0b9a68304db4673f8 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8fb1f7f0b9a68304db4673f8 Content-Type: text/plain; charset=ISO-8859-1 Okay one billion rows of data is a lot, compared to that i am far far away - means i can stay with Oracle? Maybe. But you're right when you say its not only about big data but also about your need. So storing the data is one part, doing analytical analysis is the second. I do a lot of calculations and queries to generate management criteria about how the production is going on actually, how the production went the last week, month, years and so on. Saving in a 5 minute rhythm is only a compromise to reduce the amount of data - maybe in the future the usecase will change an is about to store status of each machine as soon as it changes. This will of course increase the amount of data and the complexity of my queries again. And sure I show "Live" Data today... 5 Minute old Live Data... but if i tell the CEO that i am also able to work with real live data, i am sure this is what he wants to get .... ;-) Can you recommend me to use Cassandra for this kind of scenario or is this oversized ? Does it makes sense to start with 2 Nodes ? Can i virtualize these two Nodes ? Thx a lot for your assistance. Marc 2013/4/26 Hiller, Dean > Well, it depends more on what you will do with the data. I know I was on > a sybase(RDBMS) with 1 billion rows but it was getting close to not being > able to handle more (constraints had to be turned off and all sorts of > optimizations done and expert consultants brought in and everything). > > BUT there are other use cases where noSQL is great for (ie. It is not just > great for big data type systems). It is great for really high write > throughput as you can add more nodes and handle more writes/second than an > RDBMS very easily yet you may be doing so many deletes that the system > constantly stays at a small data set. > > You may want to analyze the data constantly or near real time involving > huge amounts of reads / second in which case noSQL can be better as well. > > Ie. Nosql is not just for big data. I know with PlayOrm for cassandra, we > have handled many different use cases out there. > > Later, > Dean > > From: Marc Teufel teufel.marc@googlemail.com>> > Reply-To: "user@cassandra.apache.org" < > user@cassandra.apache.org> > Date: Friday, April 26, 2013 8:17 AM > To: "user@cassandra.apache.org" < > user@cassandra.apache.org> > Subject: Is Cassandra oversized for this kind of use case? > > I hope the Cassandra Community can help me finding a decision. > > The project i am working on actually is located in industrial plant, > machines are connected to a server an every 5 minutes i get data from the > machines about its status. We are talking about a production with 100+ > machines, so the data amount is very high: > > Per Machine every 5th minute one row, > means 12 rows per hour, means roundabout 120 rows per day = 1200+ rows per > day > multiplied by 20 its 240.000 rows per month and 2.880.000 rows per year. I > have to hold > the last 3 years and i must be able to do analytics on this data. in the > end i deal with roundabout 10 Mio Rows (12 columns holding text and numbers > each row) > Okay, its kind of big data is not really "big data" isn'it but for me > its a lot data to handle anyway. > Actually i am holding all these data in a oracle database but doing > analytics on so many rows > is not the good and modern way i think. as the company is successfull > they will grew, means more machines, again more data to handle... > > So i thought maybe Big Data technologies are a possible solution for me to > store my data. > > Meanwhile i know Apache Hadoop is not the right tool for this kind of > thing because it scales not down.But maybe Cassandra ? This is my question > to you, do you think cassandra is the right store for this kind of data? > > I am thinking about 2 Nodes. Maybe virtual. > > Let me know what you think. And if Cassandra is not the right tool please > tell me and if you know any please tell me alternatives. Maybe i am already > doing the right thing with storing that much data in oracle database and > maybe one of you is doing the same - if so please let me also know. > > Thank you very much. > > > Web: http://www.teufel.net > -- Mail: teufel.marc@gmail.com Web: http://www.teufel.net --e89a8fb1f7f0b9a68304db4673f8 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Okay one billion rows of dat= a is a lot, compared to that i am far far away - means i can stay with Orac= le? Maybe.
But you're right when you say its not only about big data= but also about your need.

So storing the data is one part, doing analytical analysis is the= second. I do a lot of calculations and queries to generate management crit= eria about how the production is going on actually, how the production went= the last week, month, years and so on. Saving in a 5 minute rhythm is only= a compromise to reduce the amount of data - maybe in the future the usecas= e will change an is about to store status of each machine as soon as it cha= nges. This will of course increase the amount of data and the complexity of= my queries again. And sure I show "Live" Data today... 5 Minute = old Live Data... but if i tell the CEO that i am also able to work with rea= l live data, i am sure this is what he wants to get .... ;-)

Can you recommend me to use Cassandra for this kind of scenario o= r is this oversized ?

Does it makes sense to start with 2 Node= s ?

Can i virtualize these two Nodes ?


Thx a = lot for your assistance.

Marc




2013/4/26 Hiller, Dean <Dean.Hiller@nrel.gov= >
Well, it depends more on what you will do wi= th the data. =A0I know I was on a sybase(RDBMS) with 1 billion rows but it = was getting close to not being able to handle more (constraints had to be t= urned off and all sorts of optimizations done and expert consultants brough= t in and everything).

BUT there are other use cases where noSQL is great for (ie. It is not just = great for big data type systems). =A0It is great for really high write thro= ughput as you can add more nodes and handle more writes/second than an RDBM= S very easily yet you may be doing so many deletes that the system constant= ly stays at a small data set.

You may want to analyze the data constantly or near real time involving hug= e amounts of reads / second in which case noSQL can be better as well.

Ie. Nosql is not just for big data. =A0I know with PlayOrm for cassandra, w= e have handled many different use cases out there.

Later,
Dean

From: Marc Teufel <teufel.= marc@googlemail.com<mailto:teufel.marc@googlemail.com>>
Reply-To: "user@cassandra= .apache.org<mailto:user= @cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, April 26, 2013 8:17 AM
To: "user@cassandra.apach= e.org<mailto:user@cassa= ndra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Is Cassandra oversized for this kind of use case?

I hope the Cassandra Community can help me finding a decision.

The project i am working on actually is located in industrial plant, machin= es are connected to a server an every 5 minutes i get data from the machine= s about its status. We are talking about a production with 100+ machines, s= o the data amount is very high:

Per Machine every 5th minute one row,
means 12 rows per hour, means roundabout 120 rows per day =3D 1200+ rows pe= r day
multiplied by 20 its 240.000 rows per month and 2.880.000 rows per year. I = have to hold
the last 3 years and i must be able to do analytics on this data. in the en= d i deal with roundabout 10 Mio Rows (12 columns holding text and numbers e= ach row)
Okay, its kind of big data is not really =A0"big data" isn'it= =A0but for me its a lot data to handle anyway.
Actually i am holding all these data in a oracle database but doing analyti= cs on so many rows
=A0is not the good and modern way i think. as the company is successfull th= ey will grew, means more machines, again more data to handle...

So i thought maybe Big Data technologies are a possible solution for me to = store my data.

Meanwhile i know Apache Hadoop is not the right tool for this kind of thing= because it scales not down.But maybe Cassandra ? This is my question to yo= u, do you think cassandra is the right store for this kind of data?

I am thinking about 2 Nodes. Maybe virtual.

Let me know what you think. And if Cassandra is not the right tool please t= ell me and if you know any please tell me alternatives. Maybe i am already = doing the right thing with storing that much data in oracle database and ma= ybe one of you is doing the same - if so please let me also know.

Thank you very much.


Web: http://www.teufel.= net



--
Mail: teufel.marc@gmail.com
Web: http://www.teufel.net
--e89a8fb1f7f0b9a68304db4673f8--