cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandra Sekar KR <>
Subject Compatability, performance & portability of Cassandra data types (MAP, UDT & JSON) in DSE Search & Analytics
Date Fri, 19 Feb 2016 04:20:55 GMT

I'm looking for help in arriving at pros & cons of using MAP, UDT & JSON (Text) data
types in Cassandra & its ease of use/impact across other DSE products - Spark & Solr.
We are migrating an OLTP database from RDBMS to Cassandra which has 200+ columns and with
an average daily volume of 25 million records/day. The access pattern is quite simple and
in OLTP the access is always based on primary key. For OLAP, there are other access patterns
with a combination of columns where we are planning to use Spark & Solr for search &
analytical capabilities (in a separate DC).

The average size of each record is ~2KB and the application workload is of type INSERT only
(no updates/deletes). We conducted performance tests on two types of data models

1) A table with 200+ columns similar to RDBMS

2) A table with 15 columns where only critical business fields are maintained as key/value
pairs and the remaining are stored in a single column of type TEXT as JSON object.

In the results, we noticed significant advantage in the JSON model where the performance was
5X times better than columnar data model. Alternatively, we are in the process of evaluating
performance for other data types - MAP & UDT instead of using TEXT for storing JSON object.
Sample data model structure for columnar, json, map & udt types are given below:


I would like to know the performance, transformation, compatibility & portability impacts
& east-of-use of each of these data types from Search & Analytics perspective (Spark
& Solr). I'm aware that we will have to use field transformers in Solr to use index on
JSON fields, not sure about MAP & UDT. Any help on comparison of these data types in Spark
& Solr is highly appreciated.

Regards, KR

View raw message