cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Wang <dep...@gmail.com>
Subject Re: CQL3 vs Thrift
Date Wed, 24 Dec 2014 15:38:10 GMT
Ryan,

Can you elaborate a little on "Thrift over CQL is modeling clustering
columns in different nesting between rows is trivial in Thrift and not
really doable in CQL"?
On Dec 24, 2014 8:30 AM, "Ryan Svihla" <rsvihla@datastax.com> wrote:

> I'm not entirely certain how you can't model that to solve your use case
> (wouldn't you be filtering the events as well, and therefore be able to get
> all that in one query).
>
>  What you describe there has a number of avenues (collections, just
> heavier use of statics in a different order than you specified, object dump
> of events in a single column, switching up the clustering columns) of
> getting your question answered in one query. End of the day cql resolves to
> a given SStable format, you can still open up cassandra-cli and view what a
> given model looks like, when you've grokked this adequately you basically
> can bend CQL to fit your logical thrift modeling, at some point like
> learning any new language you'll learn to speak in both ( something I have
> to do nearly daily).
>
> FWIW other than the primary valid complaint remaining for Thrift over CQL
> is modeling clustering columns in different nesting between rows is trivial
> in Thrift and not really doable in CQL (clustering columns enforce a
> nesting order by logical construct), I've yet to not be able to swap a
> client from thrift to CQL ,and it's always ended up faster (so far).
>
> The main reason for this is performance on modern Cassandra and the native
> protocol is substantially better than pure thrift for many query types (see
> http://www.datastax.com/dev/blog/cassandra-2-1-now-over-50-faster) , so
> your mileage may vary, but I'd test it out first before proclaiming that
> thrift is faster for your use case (and make liberal use of cql features
> with cassandra-cli to make sure you know what's going on internally,
> remember it's all just sstables underneath).
>
>
>
>
> On Tue, Dec 23, 2014 at 12:00 PM, David Broyles <sj.climber@gmail.com>
> wrote:
>
>> Thanks, Ryan.  I wasn't aware of static column support, and indeed they
>> get me most of what I need.  I think the only potential inefficiency  is
>> still at query time.  Using Thrift, I could design the column family to get
>> the all the static and dynamic content in a single query.
>> If event_source and total_events are instead implemented as CQL3 statics,
>> I probably need to do two queries to get data for a given event_type
>>
>> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?):
>> SELECT event_source, total_events FROM timeseries WHERE event_type =
>> 'some-type'
>>
>> To get the events:
>> SELECT insertion_time, event FROM timeseries
>>
>> As a combined query, my concern is related to the overhead of repeating
>> event_type/source/total_events (although with potentially many other pieces
>> of static information).
>>
>> More generally, do you find that tuned applications tend to use Thrift, a
>> combination of Thrift and CQL3, or is CQL3 really expected to replace
>> Thrift?
>>
>> Thanks again!
>>
>> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvihla@datastax.com>
>> wrote:
>>
>>> Don't static columns get you what you want?
>>>
>>>
>>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
>>>  On Dec 22, 2014 10:50 PM, "David Broyles" <sj.climber@gmail.com> wrote:
>>>
>>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages
>>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest
>>>> new projects should use CQL3.
>>>>
>>>> I'm wondering, however, if there are certain use cases not well covered
>>>> by CQL3.  Consider the standard timeseries example:
>>>>
>>>> CREATE TABLE timeseries (
>>>>    event_type text,
>>>>    insertion_time timestamp,
>>>>    event blob,
>>>>    PRIMARY KEY (event_type, insertion_time)
>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>
>>>> What happens if I want to store additional information that is shared
>>>> by all events in the given series (but that I don't want to include in the
>>>> row ID): e.g. the event source, a cached count of the number of events
>>>> logged to date, etc.?  I might try updating the definition as follows:
>>>>
>>>> CREATE TABLE timeseries (
>>>>    event_type text,
>>>>       event_source text,
>>>>    total_events int,
>>>>    insertion_time timestamp,
>>>>    event blob,
>>>>    PRIMARY KEY (event_type, event_source, total_events, insertion_time)
>>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>>
>>>> Is this not inefficient?  When inserting or querying via CQL3, say in
>>>> batches of up to 1000 events, won't the type/source/count be repeated 1000
>>>> times?  Please let me know if I'm misunderstanding something, or if I
>>>> should be sticking to Thrift for situations like this involving mixed
>>>> static/dynamic data.
>>>>
>>>> Thanks!
>>>>
>>>
>>
>
>
> --
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Ryan Svihla
>
> Solution Architect
>
> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>

Mime
View raw message