cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <>
Subject Re: Migrating to CQL and Non Compact Storage
Date Mon, 11 Apr 2016 20:19:37 GMT
Some of this may depend on exactly how you are using so-called COMPACT
STORAGE. I mean, if your tables really are modeled as all but exactly one
column in the primary key, then okay, COMPACT STORAGE may be a reasonable
model, but that seems to be a very special, narrow use case, so for all
other cases you really do need to re-model for CQL for Cassandra 4.0. I'm
not sure why anybody is thinking otherwise. Sure, maybe will be a lot of
work, but that's life and people have been given plenty of notice. And if
it takes hours to do a data migration, I think that you can consider
yourself lucky relative to people who may require days.

Now, if there are particular Thrift use cases that don't have efficient
models in CQL, that can be discussed. Start by expressing the Thrift data
in a neutral, natural, logical, plain English data model, and then we can
see how that maps to CQL.

So, where are we? Is it just the complaint that migration is slow and
re-modeling is difficult, or are there specific questions about how to do
the re-modeling?

-- Jack Krupansky

On Mon, Apr 11, 2016 at 1:30 PM, Anuj Wadehra <>

> Thanks Jim. I think you understand the pain of migrating TBs of data to
> new tables. There is no command to change from compact to non compact
> storage and the fastest solution to migrate data using Spark is too slow
> for production systems.
> And the pain gets bigger when your performance dips after moving to non
> compact storage table. Thats because non compact storage is quite
> inefficient storage format till 3.x and its incurs heavy penalty on Row
> Scan performance in Analytics workload.
> Please go throught the link to understand how old Compact storage gives
> much better performance than non compact storage as far as Row Scans are
> concerned:
> The flexibility of Cql comes at heavy cost until 3.x.
> Thanks
> Anuj
> Sent from Yahoo Mail on Android
> <>
> On Mon, 11 Apr, 2016 at 10:35 PM, Jim Ancona
> <> wrote:
> Jack, the Datastax link he posted (
> says that for column
> families with mixed dynamic and static columns: "The only solution to be
> able to access the column family fully is to remove the declared columns
> from the thrift schema altogether..." I think that page describes the
> problem and the potential solutions well. I haven't seen an answer to
> Anuj's question about why the native CQL solution using collections doesn't
> perform as well.
> Keep in mind that some of us understand CQL just fine but have working
> pre-CQL Thrift-based systems storing hundreds of terabytes of data and with
> requirements that mean that saying "bite the bullet and re-model your
> data" is not really helpful. Another quote from that Datastax link:
> "Thrift isn't going anywhere." Granted that that link is three-plus years
> old, but Thrift now *is* now going away, so it's not unexpected that people
> will be trying to figure out how to deal with that. It's bad enough that we
> need to rewrite our clients to use CQL instead of Thrift. It's not helpful
> to say that we should also re-model and migrate all our data.
> Jim
> On Mon, Apr 11, 2016 at 11:29 AM, Jack Krupansky <
> > wrote:
>> Sorry, but your message is too confusing - you say "reading dynamic
>> columns in CQL" and "make the table schema less", but neither has any
>> relevance to CQL! 1. CQL tables always have schemas. 2. All columns in CQL
>> are statically declared (even maps/collections are statically declared
>> columns.) Granted, it is a challenge for Thrift users to get used to the
>> terminology of CQL, but it is required. If necessary, review some of the
>> free online training videos for data modeling.
>> Unless your data model is very simply and does directly translate into
>> CQL, you probably do need to bite the bullet and re-model your data to
>> exploit the features of CQL rather than fight CQL trying to mimic Thrift
>> per se.
>> In any case, take another shot at framing the problem and then maybe
>> people here can help you out.
>> -- Jack Krupansky
>> On Mon, Apr 11, 2016 at 10:39 AM, Anuj Wadehra <>
>> wrote:
>>> Any comments or suggestions on this one?
>>> Thanks
>>> Anuj
>>> Sent from Yahoo Mail on Android
>>> <>
>>> On Sun, 10 Apr, 2016 at 11:39 PM, Anuj Wadehra
>>> <> wrote:
>>> Hi
>>> We are on 2.0.14 and Thrift. We are planning to migrate to CQL soon but
>>> facing some challenges.
>>> We have a cf with a mix of statically defined columns and dynamic
>>> columns (created at run time). For reading dynamic columns in CQL,
>>> we have two options:
>>> 1. Drop all columns and make the table schema less. This way, we will
>>> get a Cql row for each column defined for a row key--As mentioned here:
>>> 2.Migrate entire data to a new non compact storage table and create
>>> collections for dynamic columns in new table.
>>> In our case, we have observed that approach 2 causes 3 times slower
>>> performance in Range scan queries used by Spark. This is not acceptable.
>>> Cassandra 3 has optimized storage engine but we are not comfortable moving
>>> to 3.x in production.
>>> Moreover, data migration to new table using Spark takes hours.
>>> Any suggestions for the two issues?
>>> Thanks
>>> Anuj
>>> Sent from Yahoo Mail on Android
>>> <>

View raw message