cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhuvan Rawal <bhu1ra...@gmail.com>
Subject Re: Requesting some details for my use case
Date Tue, 05 Jan 2016 21:02:36 GMT
*Thanks Jack* *for the detailed advice*.

Yes it is a Java Application.

We have a Denormalized view of our data already in place,  we use it for
storing it in MongoDB as a cache, however will get our hands dirty before
implementation. We would like to have a single DB view. And replace MongoDB
& MySQL with a single data store. If we talk numbers then we can expect 10
Million create/update requests a day and ~500 Million read requests.

The question here not "should I or should I not", but "which one".

A lot of the features you have mentioned are supported but not
advisable. *(automated
Materialized View feature) (Triggers are supported, but not advised)
(Secondary indexes are supported, but not advised). *By when do you believe
that these will be stable enough to use for enterprise implementation?

We have made our minds clear far as shift to NoSQL is concerned as MySQL is
not able to serve our purpose and is currently a bottleneck in the design.

 From all the benchmarks we have analyzed for our use case, Cassandra seems
to be doing better as far as performance is concerned.  Our only concern is
to know as a Primary Database how Cassandra compares with HBase. By Primary
database I mean the attributes: Data Consistency, Transaction Management
and Rollback, brisk Failure Recovery, cross datacenter replication and
partition aware sharding.

The general opinion of Cassandra is that its more of a cache, and as we are
going to be replacing our primary Data Store we need something fast but not
at the expense of reliability. Can you guide me towards a case study where
someone has tuned it in such a way to perform reliably for most use cases.

Also Ill be grateful if someone directs me to a repository where I can find
major customers of the DB's and their case studies.

Thanks & Regards,
Bhuvan

On Tue, Jan 5, 2016 at 9:56 PM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:

> Bear in mind that you won't be able to merely "tune" your schema - you
> will need to completely redesign your data model. Step one is to look at
> all of the queries you need to perform and get a handle on what flat,
> denormalized data model they will need to execute performantly in a NoSQL
> database. No JOINs. No ad hoc queries. Secondary indexes are supported, but
> not advised. The general model is that you have a "query table" for each
> form of query, with the primary key adapted to the needs of the query. That
> means a lot of denormalization and repetition of data. The new, automated
> Materialized View feature of Cassandra 3.0 can help with that a lot, but is
> a new feature and not quite stable enough for production (no DataStax
> Enterprise (DSE) release with 3.0 yet.) Triggers are supported, but not
> advised - better to do that processing at the application level. DSE also
> supports Hadoop and Spark for batch/analytics and Solr for search and ad
> hoc queries (or use Stratio or Stargate for Lucene queries.)
>
> Best to start with a basic proof of concept implementation to get your
> feet wet and learn the ins and outs before making a full commitment.
>
> Is this a Java app? The Java Driver is where you need to get started in
> terms of ingesting and querying data. It's a bit more sophisticated than
> just a simple JDBC interface. Most of your queries will need to be
> rewritten anyway even though the CQL syntax does indeed look a lot like
> SQL, but much of that will be because your data model will need to be made
> NoSQL-compatible.
>
> That should get you started.
>
>
> -- Jack Krupansky
>
> On Tue, Jan 5, 2016 at 10:52 AM, Bhuvan Rawal <bhu1rawal@gmail.com> wrote:
>
>> I understand, Ravi,  we have our application layers well defined. The
>> major changes will be in database access layers and entities will be
>> changed. Schema will be modified to tune the efficiency of the data store
>> chosen.
>>
>> We have been using mongo as a cache for a long time now, but as its a
>> document store and since we have a crisp well defined schema we chose to go
>> with a columnar database.
>>
>> Our data size has been growing very rapidly. Currently it is 200GB with
>> indexes, in couple of years it will grow up to approx 5 TB. And we may need
>> to run procedures to aggregate data and update tables.
>>
>> On Tue, Jan 5, 2016 at 6:54 PM, Ravi Krishna <sravikrishna3@gmail.com>
>> wrote:
>>
>>> You are moving from a SQL database to C* ??? I hope you are aware of the
>>> differences between a nosql like C* and a RDBMS. To keep it short, the app
>>> has to change significantly.
>>>
>>> Please read documentation on differences between nosql and RDBMS.
>>>
>>> thanks.
>>>
>>> On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal <bhu1rawal@gmail.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Im planning to shift from SQL database to a columnar nosql database, we
>>>> have streamlined our choices to Cassandra and HBase. I would really
>>>> appreciate if someone decent experience with both give me a honest
>>>> comparison on below parameters (links to neutral benchmarks/blogs also
>>>> appreciated):
>>>>
>>>> 1. Data Consistency (Eventual consistency allowed but define "eventual")
>>>> 2. Ease of Scaling Up
>>>> 3. Managebility
>>>> 4. Failure Recovery options
>>>> 5. Secondary Indexing
>>>> 6. Data Aggregation
>>>> 7. Query Language (3rd party wrapper solutions also allowed)
>>>> 8. Security
>>>> 9. *Commercial Support for quick solutions to issues*.
>>>> 10. Run batch job on data like map reduce or some common aggregation
>>>> functions using row scan. Any other packages for cassandra to achieve this?
>>>> 11. Trigger specific updates on tables used for secondary index.
>>>> 12. Please consider that our DB will be the source of truth, with no
>>>> specific requirement of immediate data consistency amongst nodes.
>>>>
>>>> Regards,
>>>> Bhuvan Rawal
>>>> SDE
>>>>
>>>
>>>
>>
>

Mime
View raw message