Bear in mind that you won't be able to merely "tune" your schema - you will need to completely redesign your data model. Step one is to look at all of the queries you need to perform and get a handle on what flat, denormalized data model they will need to execute performantly in a NoSQL database. No JOINs. No ad hoc queries. Secondary indexes are supported, but not advised. The general model is that you have a "query table" for each form of query, with the primary key adapted to the needs of the query. That means a lot of denormalization and repetition of data. The new, automated Materialized View feature of Cassandra 3.0 can help with that a lot, but is a new feature and not quite stable enough for production (no DataStax Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not advised - better to do that processing at the application level. DSE also supports Hadoop and Spark for batch/analytics and Solr for search and ad hoc queries (or use Stratio or Stargate for Lucene queries.)

Best to start with a basic proof of concept implementation to get your feet wet and learn the ins and outs before making a full commitment.

Is this a Java app? The Java Driver is where you need to get started in terms of ingesting and querying data. It's a bit more sophisticated than just a simple JDBC interface. Most of your queries will need to be rewritten anyway even though the CQL syntax does indeed look a lot like SQL, but much of that will be because your data model will need to be made NoSQL-compatible.

That should get you started.


-- Jack Krupansky

On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bhu1rawal@gmail.com> wrote:
I understand, Ravi,  we have our application layers well defined. The major changes will be in database access layers and entities will be changed. Schema will be modified to tune the efficiency of the data store chosen.

We have been using mongo as a cache for a long time now, but as its a document store and since we have a crisp well defined schema we chose to go with a columnar database. 

Our data size has been growing very rapidly. Currently it is 200GB with indexes, in couple of years it will grow up to approx 5 TB. And we may need to run procedures to aggregate data and update tables.

On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sravikrishna3@gmail.com> wrote:
You are moving from a SQL database to C* ??? I hope you are aware of the differences between a nosql like C* and a RDBMS. To keep it short, the app has to change significantly.

Please read documentation on differences between nosql and RDBMS.

thanks.

On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bhu1rawal@gmail.com> wrote:
Hi All, 

Im planning to shift from SQL database to a columnar nosql database, we have streamlined our choices to Cassandra and HBase. I would really appreciate if someone decent experience with both give me a honest comparison on below parameters (links to neutral benchmarks/blogs also appreciated): 

1. Data Consistency (Eventual consistency allowed but define "eventual")
2. Ease of Scaling Up
3. Managebility
4. Failure Recovery options
5. Secondary Indexing
6. Data Aggregation
7. Query Language (3rd party wrapper solutions also allowed)
8. Security
9. Commercial Support for quick solutions to issues.
10. Run batch job on data like map reduce or some common aggregation functions using row scan. Any other packages for cassandra to achieve this?
11. Trigger specific updates on tables used for secondary index.
12. Please consider that our DB will be the source of truth, with no specific requirement of immediate data consistency amongst nodes.

Regards,
Bhuvan Rawal
SDE