cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Lin <wool...@gmail.com>
Subject Re: CQL3 vs Thrift
Date Tue, 23 Dec 2014 18:26:57 GMT
I'm bias in favor of using both thrift and CQL3, though many people on the
list probably think I'm crazy.

CQL3 is good if what you need fits nicely in static columns, but it doesn't
if you want to use dynamic columns and/or mix & match both in the same
columnFamily. For a lot of what I use Cassandra for, CQL3 currently doesn't
provide all the functionality. It is possible to extend CQL3 further to
make it handle 100% of the use cases that Thrift supports today.

whether that will happen is anyone's guess. SQL "like" syntax is popular
and many people understand it, but it doesn't necessarily line up perfectly
with NoSql column databases.


On Tue, Dec 23, 2014 at 1:00 PM, David Broyles <sj.climber@gmail.com> wrote:

> Thanks, Ryan.  I wasn't aware of static column support, and indeed they
> get me most of what I need.  I think the only potential inefficiency  is
> still at query time.  Using Thrift, I could design the column family to get
> the all the static and dynamic content in a single query.
> If event_source and total_events are instead implemented as CQL3 statics,
> I probably need to do two queries to get data for a given event_type
>
> To get event metadata (is the LIMIT 1 needed to reduce to 1 record?):
> SELECT event_source, total_events FROM timeseries WHERE event_type =
> 'some-type'
>
> To get the events:
> SELECT insertion_time, event FROM timeseries
>
> As a combined query, my concern is related to the overhead of repeating
> event_type/source/total_events (although with potentially many other pieces
> of static information).
>
> More generally, do you find that tuned applications tend to use Thrift, a
> combination of Thrift and CQL3, or is CQL3 really expected to replace
> Thrift?
>
> Thanks again!
>
> On Mon, Dec 22, 2014 at 9:50 PM, Ryan Svihla <rsvihla@datastax.com> wrote:
>
>> Don't static columns get you what you want?
>>
>>
>> http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refStaticCol.html
>>  On Dec 22, 2014 10:50 PM, "David Broyles" <sj.climber@gmail.com> wrote:
>>
>>> Although I used Cassandra 1.0.X extensively, I'm new to CQL3.  Pages
>>> such as http://wiki.apache.org/cassandra/ClientOptionsThrift suggest
>>> new projects should use CQL3.
>>>
>>> I'm wondering, however, if there are certain use cases not well covered
>>> by CQL3.  Consider the standard timeseries example:
>>>
>>> CREATE TABLE timeseries (
>>>    event_type text,
>>>    insertion_time timestamp,
>>>    event blob,
>>>    PRIMARY KEY (event_type, insertion_time)
>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>
>>> What happens if I want to store additional information that is shared by
>>> all events in the given series (but that I don't want to include in the row
>>> ID): e.g. the event source, a cached count of the number of events logged
>>> to date, etc.?  I might try updating the definition as follows:
>>>
>>> CREATE TABLE timeseries (
>>>    event_type text,
>>>       event_source text,
>>>    total_events int,
>>>    insertion_time timestamp,
>>>    event blob,
>>>    PRIMARY KEY (event_type, event_source, total_events, insertion_time)
>>> ) WITH CLUSTERING ORDER BY (insertion_time DESC);
>>>
>>> Is this not inefficient?  When inserting or querying via CQL3, say in
>>> batches of up to 1000 events, won't the type/source/count be repeated 1000
>>> times?  Please let me know if I'm misunderstanding something, or if I
>>> should be sticking to Thrift for situations like this involving mixed
>>> static/dynamic data.
>>>
>>> Thanks!
>>>
>>
>

Mime
View raw message