cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Ancona <...@anconafamily.com>
Subject Re: Migrating to CQL and Non Compact Storage
Date Mon, 11 Apr 2016 22:15:16 GMT
On Mon, Apr 11, 2016 at 4:19 PM, Jack Krupansky <jack.krupansky@gmail.com>
wrote:

> Some of this may depend on exactly how you are using so-called COMPACT
> STORAGE. I mean, if your tables really are modeled as all but exactly one
> column in the primary key, then okay, COMPACT STORAGE may be a reasonable
> model, but that seems to be a very special, narrow use case, so for all
> other cases you really do need to re-model for CQL for Cassandra 4.0.
>
There was no such restriction when modeling with Thrift. It's an artifact
of how CQL chose to expose the Thrift data model.

I'm not sure why anybody is thinking otherwise. Sure, maybe will be a lot
> of work, but that's life and people have been given plenty of notice.
>
"That's life" minimizes the difficulty of doing this sort of migration for
large, mission-critical systems. It would require large amounts of time, as
well as temporarily doubling hardware resources amounting to dozens up to
hundreds of nodes.

And if it takes hours to do a data migration, I think that you can consider
> yourself lucky relative to people who may require days.
>
Or more.

Now, if there are particular Thrift use cases that don't have efficient
> models in CQL, that can be discussed. Start by expressing the Thrift data
> in a neutral, natural, logical, plain English data model, and then we can
> see how that maps to CQL.
>
> So, where are we? Is it just the complaint that migration is slow and
> re-modeling is difficult, or are there specific questions about how to do
> the re-modeling?
>
My purpose is not to complain, but to educate :-). Telling someone "just
remodel your data" is not helpful, especially after he's told you that he
tried that and ran into performance issues. (Note that the link he posted
shows an order of magnitude decrease in throughput when moving from COMPACT
STORE to CQL3 native tables for analytics workloads, so it's not just his
use case.) Do you have any suggestions of ways he might mitigate those
issues? Is there information you need to make such a recommendation?

Jim


>
>
> -- Jack Krupansky
>
> On Mon, Apr 11, 2016 at 1:30 PM, Anuj Wadehra <anujw_2003@yahoo.co.in>
> wrote:
>
>> Thanks Jim. I think you understand the pain of migrating TBs of data to
>> new tables. There is no command to change from compact to non compact
>> storage and the fastest solution to migrate data using Spark is too slow
>> for production systems.
>>
>> And the pain gets bigger when your performance dips after moving to non
>> compact storage table. Thats because non compact storage is quite
>> inefficient storage format till 3.x and its incurs heavy penalty on Row
>> Scan performance in Analytics workload.
>> Please go throught the link to understand how old Compact storage gives
>> much better performance than non compact storage as far as Row Scans are
>> concerned:
>> https://www.oreilly.com/ideas/apache-cassandra-for-analytics-a-performance-and-storage-analysis
>>
>> The flexibility of Cql comes at heavy cost until 3.x.
>>
>>
>>
>> Thanks
>> Anuj
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>> On Mon, 11 Apr, 2016 at 10:35 PM, Jim Ancona
>> <jim@anconafamily.com> wrote:
>> Jack, the Datastax link he posted (
>> http://www.datastax.com/dev/blog/thrift-to-cql3) says that for column
>> families with mixed dynamic and static columns: "The only solution to be
>> able to access the column family fully is to remove the declared columns
>> from the thrift schema altogether..." I think that page describes the
>> problem and the potential solutions well. I haven't seen an answer to
>> Anuj's question about why the native CQL solution using collections doesn't
>> perform as well.
>>
>> Keep in mind that some of us understand CQL just fine but have working
>> pre-CQL Thrift-based systems storing hundreds of terabytes of data and with
>> requirements that mean that saying "bite the bullet and re-model your
>> data" is not really helpful. Another quote from that Datastax link:
>> "Thrift isn't going anywhere." Granted that that link is three-plus years
>> old, but Thrift now *is* now going away, so it's not unexpected that people
>> will be trying to figure out how to deal with that. It's bad enough that we
>> need to rewrite our clients to use CQL instead of Thrift. It's not helpful
>> to say that we should also re-model and migrate all our data.
>>
>> Jim
>>
>> On Mon, Apr 11, 2016 at 11:29 AM, Jack Krupansky <
>> jack.krupansky@gmail.com> wrote:
>>
>>> Sorry, but your message is too confusing - you say "reading dynamic
>>> columns in CQL" and "make the table schema less", but neither has any
>>> relevance to CQL! 1. CQL tables always have schemas. 2. All columns in CQL
>>> are statically declared (even maps/collections are statically declared
>>> columns.) Granted, it is a challenge for Thrift users to get used to the
>>> terminology of CQL, but it is required. If necessary, review some of the
>>> free online training videos for data modeling.
>>>
>>> Unless your data model is very simply and does directly translate into
>>> CQL, you probably do need to bite the bullet and re-model your data to
>>> exploit the features of CQL rather than fight CQL trying to mimic Thrift
>>> per se.
>>>
>>> In any case, take another shot at framing the problem and then maybe
>>> people here can help you out.
>>>
>>> -- Jack Krupansky
>>>
>>> On Mon, Apr 11, 2016 at 10:39 AM, Anuj Wadehra <anujw_2003@yahoo.co.in>
>>> wrote:
>>>
>>>> Any comments or suggestions on this one?
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>> On Sun, 10 Apr, 2016 at 11:39 PM, Anuj Wadehra
>>>> <anujw_2003@yahoo.co.in> wrote:
>>>> Hi
>>>>
>>>> We are on 2.0.14 and Thrift. We are planning to migrate to CQL soon but
>>>> facing some challenges.
>>>>
>>>> We have a cf with a mix of statically defined columns and dynamic
>>>> columns (created at run time). For reading dynamic columns in CQL,
>>>> we have two options:
>>>>
>>>> 1. Drop all columns and make the table schema less. This way, we will
>>>> get a Cql row for each column defined for a row key--As mentioned here:
>>>> http://www.datastax.com/dev/blog/thrift-to-cql3
>>>>
>>>> 2.Migrate entire data to a new non compact storage table and create
>>>> collections for dynamic columns in new table.
>>>>
>>>> In our case, we have observed that approach 2 causes 3 times slower
>>>> performance in Range scan queries used by Spark. This is not acceptable.
>>>> Cassandra 3 has optimized storage engine but we are not comfortable moving
>>>> to 3.x in production.
>>>>
>>>> Moreover, data migration to new table using Spark takes hours.
>>>>
>>>> Any suggestions for the two issues?
>>>>
>>>>
>>>> Thanks
>>>> Anuj
>>>>
>>>>
>>>> Sent from Yahoo Mail on Android
>>>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>>>
>>>>
>>>
>>
>

Mime
View raw message