cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhuvan Rawal <>
Subject Re: Requesting some details for my use case
Date Thu, 07 Jan 2016 08:24:09 GMT
Hi Jack,

We are valuing reliability and consistency over performance right now. In
E-commerce industry we can expect unexpected spikes at odd times.

Ill be grateful if you tell me about reliability and failover scenarios.

On Wed, Jan 6, 2016 at 2:59 AM, Jack Krupansky <>

> DataStax has documented quite a few customers/case studies:
> Materialized Views should be considered if you can go straight to 3.0, but
> you can always do the same synthesized views yourself in your app, which is
> current standard best practice anyways. MV is just a way to automate that
> best practice.
> The key to performance is to characterize your load requirements and then
> make sure to provision your cluster with enough nodes to support that load.
> You'll have to do a proof of concept implementation to verify your own
> requirements. Like start with a 6 or 8 node cluster for a subset of the
> data and add nodes as needed to accommodate load. The trick is to limit the
> amount of data on each node so that incoming requests can be processed as
> rapidly as possible to meet latency requirements, and then to scale up load
> capacity by adding nodes.
> -- Jack Krupansky
> On Tue, Jan 5, 2016 at 4:02 PM, Bhuvan Rawal <> wrote:
>> *Thanks Jack* *for the detailed advice*.
>> Yes it is a Java Application.
>> We have a Denormalized view of our data already in place,  we use it for
>> storing it in MongoDB as a cache, however will get our hands dirty before
>> implementation. We would like to have a single DB view. And replace MongoDB
>> & MySQL with a single data store. If we talk numbers then we can expect 10
>> Million create/update requests a day and ~500 Million read requests.
>> The question here not "should I or should I not", but "which one".
>> A lot of the features you have mentioned are supported but not advisable. *(automated
>> Materialized View feature) (Triggers are supported, but not advised)
>> (Secondary indexes are supported, but not advised). *By when do you
>> believe that these will be stable enough to use for enterprise
>> implementation?
>> We have made our minds clear far as shift to NoSQL is concerned as MySQL
>> is not able to serve our purpose and is currently a bottleneck in the
>> design.
>>  From all the benchmarks we have analyzed for our use case, Cassandra
>> seems to be doing better as far as performance is concerned.  Our only
>> concern is to know as a Primary Database how Cassandra compares with HBase.
>> By Primary database I mean the attributes: Data Consistency, Transaction
>> Management and Rollback, brisk Failure Recovery, cross datacenter
>> replication and partition aware sharding.
>> The general opinion of Cassandra is that its more of a cache, and as we
>> are going to be replacing our primary Data Store we need something fast but
>> not at the expense of reliability. Can you guide me towards a case study
>> where someone has tuned it in such a way to perform reliably for most use
>> cases.
>> Also Ill be grateful if someone directs me to a repository where I can
>> find major customers of the DB's and their case studies.
>> Thanks & Regards,
>> Bhuvan
>> On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky <>
>> wrote:
>>> Bear in mind that you won't be able to merely "tune" your schema - you
>>> will need to completely redesign your data model. Step one is to look at
>>> all of the queries you need to perform and get a handle on what flat,
>>> denormalized data model they will need to execute performantly in a NoSQL
>>> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but
>>> not advised. The general model is that you have a "query table" for each
>>> form of query, with the primary key adapted to the needs of the query. That
>>> means a lot of denormalization and repetition of data. The new, automated
>>> Materialized View feature of Cassandra 3.0 can help with that a lot, but is
>>> a new feature and not quite stable enough for production (no DataStax
>>> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not
>>> advised - better to do that processing at the application level. DSE also
>>> supports Hadoop and Spark for batch/analytics and Solr for search and ad
>>> hoc queries (or use Stratio or Stargate for Lucene queries.)
>>> Best to start with a basic proof of concept implementation to get your
>>> feet wet and learn the ins and outs before making a full commitment.
>>> Is this a Java app? The Java Driver is where you need to get started in
>>> terms of ingesting and querying data. It's a bit more sophisticated than
>>> just a simple JDBC interface. Most of your queries will need to be
>>> rewritten anyway even though the CQL syntax does indeed look a lot like
>>> SQL, but much of that will be because your data model will need to be made
>>> NoSQL-compatible.
>>> That should get you started.
>>> -- Jack Krupansky
>>> On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <>
>>> wrote:
>>>> I understand, Ravi,  we have our application layers well defined. The
>>>> major changes will be in database access layers and entities will be
>>>> changed. Schema will be modified to tune the efficiency of the data store
>>>> chosen.
>>>> We have been using mongo as a cache for a long time now, but as its a
>>>> document store and since we have a crisp well defined schema we chose to
>>>> with a columnar database.
>>>> Our data size has been growing very rapidly. Currently it is 200GB with
>>>> indexes, in couple of years it will grow up to approx 5 TB. And we may need
>>>> to run procedures to aggregate data and update tables.
>>>> On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <>
>>>> wrote:
>>>>> You are moving from a SQL database to C* ??? I hope you are aware of
>>>>> the differences between a nosql like C* and a RDBMS. To keep it short,
>>>>> app has to change significantly.
>>>>> Please read documentation on differences between nosql and RDBMS.
>>>>> thanks.
>>>>> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <>
>>>>> wrote:
>>>>>> Hi All,
>>>>>> Im planning to shift from SQL database to a columnar nosql database,
>>>>>> we have streamlined our choices to Cassandra and HBase. I would really
>>>>>> appreciate if someone decent experience with both give me a honest
>>>>>> comparison on below parameters (links to neutral benchmarks/blogs
>>>>>> appreciated):
>>>>>> 1. Data Consistency (Eventual consistency allowed but define
>>>>>> "eventual")
>>>>>> 2. Ease of Scaling Up
>>>>>> 3. Managebility
>>>>>> 4. Failure Recovery options
>>>>>> 5. Secondary Indexing
>>>>>> 6. Data Aggregation
>>>>>> 7. Query Language (3rd party wrapper solutions also allowed)
>>>>>> 8. Security
>>>>>> 9. *Commercial Support for quick solutions to issues*.
>>>>>> 10. Run batch job on data like map reduce or some common aggregation
>>>>>> functions using row scan. Any other packages for cassandra to achieve
>>>>>> 11. Trigger specific updates on tables used for secondary index.
>>>>>> 12. Please consider that our DB will be the source of truth, with
>>>>>> specific requirement of immediate data consistency amongst nodes.
>>>>>> Regards,
>>>>>> Bhuvan Rawal
>>>>>> SDE

View raw message