incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hiller, Dean" <Dean.Hil...@nrel.gov>
Subject Re: Is Cassandra oversized for this kind of use case?
Date Fri, 26 Apr 2013 14:26:01 GMT
Well, it depends more on what you will do with the data.  I know I was on a sybase(RDBMS) with
1 billion rows but it was getting close to not being able to handle more (constraints had
to be turned off and all sorts of optimizations done and expert consultants brought in and
everything).

BUT there are other use cases where noSQL is great for (ie. It is not just great for big data
type systems).  It is great for really high write throughput as you can add more nodes and
handle more writes/second than an RDBMS very easily yet you may be doing so many deletes that
the system constantly stays at a small data set.

You may want to analyze the data constantly or near real time involving huge amounts of reads
/ second in which case noSQL can be better as well.

Ie. Nosql is not just for big data.  I know with PlayOrm for cassandra, we have handled many
different use cases out there.

Later,
Dean

From: Marc Teufel <teufel.marc@googlemail.com<mailto:teufel.marc@googlemail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Friday, April 26, 2013 8:17 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Is Cassandra oversized for this kind of use case?

I hope the Cassandra Community can help me finding a decision.

The project i am working on actually is located in industrial plant, machines are connected
to a server an every 5 minutes i get data from the machines about its status. We are talking
about a production with 100+ machines, so the data amount is very high:

Per Machine every 5th minute one row,
means 12 rows per hour, means roundabout 120 rows per day = 1200+ rows per day
multiplied by 20 its 240.000 rows per month and 2.880.000 rows per year. I have to hold
the last 3 years and i must be able to do analytics on this data. in the end i deal with roundabout
10 Mio Rows (12 columns holding text and numbers each row)
Okay, its kind of big data is not really  "big data" isn'it  but for me its a lot data to
handle anyway.
Actually i am holding all these data in a oracle database but doing analytics on so many rows
 is not the good and modern way i think. as the company is successfull they will grew, means
more machines, again more data to handle...

So i thought maybe Big Data technologies are a possible solution for me to store my data.

Meanwhile i know Apache Hadoop is not the right tool for this kind of thing because it scales
not down.But maybe Cassandra ? This is my question to you, do you think cassandra is the right
store for this kind of data?

I am thinking about 2 Nodes. Maybe virtual.

Let me know what you think. And if Cassandra is not the right tool please tell me and if you
know any please tell me alternatives. Maybe i am already doing the right thing with storing
that much data in oracle database and maybe one of you is doing the same - if so please let
me also know.

Thank you very much.


Web: http://www.teufel.net

Mime
View raw message