incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Is Cassandra right for me?
Date Fri, 21 Sep 2012 00:58:29 GMT
> Actually, if I use community edition for now, I wouldn't be able to use hadoop against
data stored in CFS? 
AFAIK DSC is a packaged deployment of Apache Cassandra. You should be ale to use Hadoop against
it, in the same way you can use hadoop against Apache Cassandra. 

You "can do" anything with computers if you have enough time and patience. DSE reduces the
amount of time and patience needed to run Hadoop over Cassandra. Specifically it helps by
providing a HDFS and Hive Meta Store that run on Cassandra. This reduces the number of moving
parts you need to provision. 

> Would writes on HDFS be so quick as in Cassandra?
Yes and no. 
HDFS uses a big bock size, so while it may absorb writes quickly you may not be able to read
them immediately. 
Remember you may need a HDFS layer for intermediate results. 
 
> would I have advantages in using Cassandra instead of HBase?

Cassandra provides no single point of failure, great scalability, tuneable consistency, a
flexible data model and very easy single package deployment. My HBase knowledge is limited,
but I would check those points and go with whatever you feel comfortable with. 

> If everything in my model fits into a relational database, if my data is structured,
would it still be a good idea to use Cassandra? Why?
It's reasonable to use cassandra for structured data. After a few iterations of development
you may find that the current structure is not the best for a non-RDBMS. e.g. It's often easier
to work with larger entities that violate Normal Form requirements.

There are lots of advantages to use Cassandra, just as there are benefits to using a RDBMS
rather than custom flat files. If you feel your project will benefit from those advantages,
and/or you are technically curious, I would recommend  trying Cassandra. 

Chose a small part of your product and create a Proof of Concept, it should only take a week
or so. Make as many mistakes as you can as fast as you can and have fun.   

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/09/2012, at 1:51 AM, Marcelo Elias Del Valle <mvallebr@gmail.com> wrote:

> Aaron,
> 
>     Thank you very much for the answers! Helped me a lot!
>     I would like just a bit more clarification about the points bellow, if you allow
me:
> 
> You can query your data using Hadoop easily enough. You may want take a look at DSE from
 http://datastax.com/ it makes using Hadoop and Solr with cassandra easier.
> Actually, if I use community edition for now, I wouldn't be able to use hadoop against
data stored in CFS? We are considering the enterprise edition here, but the best scenario
would be using it just when really needed. Would writes on HDFS be so quick as in Cassandra?
> 
> It depends on how many moving parts you are comfortable with. Same for the questions
about HDFS etc. Start with the smallest about of infrastructure.
> Sorry, I didn't really understand this part. I am not sure what you wanted to say, but
the question was about using nosql instead a relational database in this case. If learning
nosql is not a problem, would I have advantages in using Cassandra instead of HBase? If everything
in my model fits into a relational database, if my data is structured, would it still be a
good idea to use Cassandra? Why?
> 
> 
> Thanks,
> Marcelo.
> 
> 2012/9/18 aaron morton <aaron@thelastpickle.com>
>> Also, I saw a presentation which said that if I don't have rows with more than a
hundred rows in Cassandra, whether I am doing something wrong or I shouldn't be using Cassandra.

> I do not agree with that statement. (I read that as rows with ore than a hundred _columns_)
> 
>> I need to support a high volume of writes per second. I might have a billion writes
per hour
> Thats about 280K /sec. Netflix did a benchmark that shows 1.1M/sec http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
> 
>> I need to write non-structured data that will be processed later by hadoop processes
to generate structured data from it. Later, I index the structured data using SOLR or SOLANDRA,
so the data can be consulted by my end user application. Is Cassandra recommended for that,
or should I be thinking in writting directly to HDFS files, for instance? What's the main
advantage I get from storing data in a nosql service like Cassandra, when compared to storing
files into HDFS?
> You can query your data using Hadoop easily enough. You may want take a look at DSE from
 http://datastax.com/ it makes using Hadoop and Solr with cassandra easier. 
> 
>> If I don't need to perform complicated queries in Cassandra, should I store the json-like
data just as a column value? I am afraid of doing something wrong here, as I would need just
to store the json file and some more 5 or 6 fields to query the files later.
> Store the data in the way that best supports the read queries you want to make. If you
always read all the fields, or it's a canonical record of events storing as JSON may be best.
If you often get a few fields, and maybe they are updated, storing each field as a column
value may be best. 
> 
>> Does it make sense to you to use hadoop to process data from Cassandra and store
the results in a database, like HBase? Once I have structured data, is there any reason I
should use Cassandra instead of HBase?
> It depends on how many moving parts you are comfortable with. Same for the questions
about HDFS etc. Start with the smallest about of infrastructure. 
> 
> Hope that helps. 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 18/09/2012, at 10:28 AM, Marcelo Elias Del Valle <mvallebr@gmail.com> wrote:
> 
>> Hello,
>> 
>>      I am new to Cassandra and I am in doubt if Cassandra is the right technology
to use in the architecture I am defining. Also, I saw a presentation which said that if I
don't have rows with more than a hundred rows in Cassandra, whether I am doing something wrong
or I shouldn't be using Cassandra. Therefore, it might be the case I am doing something wrong.
If you could help me to find out the answer for these questions by giving any feedback, it
would be highly appreciated. 
>>      Here is my need and what I am thinking in using Cassandra for:
>> I need to support a high volume of writes per second. I might have a billion writes
per hour
>> I need to write non-structured data that will be processed later by hadoop processes
to generate structured data from it. Later, I index the structured data using SOLR or SOLANDRA,
so the data can be consulted by my end user application. Is Cassandra recommended for that,
or should I be thinking in writting directly to HDFS files, for instance? What's the main
advantage I get from storing data in a nosql service like Cassandra, when compared to storing
files into HDFS?
>> Usually I will write json data associated to an ID and my hadoop processes will process
this data to write data to a database. I have two doubts here:
>> If I don't need to perform complicated queries in Cassandra, should I store the json-like
data just as a column value? I am afraid of doing something wrong here, as I would need just
to store the json file and some more 5 or 6 fields to query the files later.
>> Does it make sense to you to use hadoop to process data from Cassandra and store
the results in a database, like HBase? Once I have structured data, is there any reason I
should use Cassandra instead of HBase?
>>      I am sorry if the questions are too dummy, I have been watching a lot of videos
and reading a lot of documentation about Cassandra, but honestly, more I read more I have
questions. 
>> 
>> Thanks in advance.
>> 
>> Best regards,
>> -- 
>> Marcelo Elias Del Valle
>> http://mvalle.com - @mvallebr
> 
> 
> 
> 
> -- 
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr


Mime
View raw message