cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hartzman, Leslie" <>
Subject RE: Ad-hoc queries question
Date Fri, 20 Sep 2013 23:57:34 GMT
Cool! Thanks for the suggestions.

From: Peter Lin []
Sent: Friday, September 20, 2013 4:52 PM
Subject: Re: Ad-hoc queries question

there are several ways of handling these types of use cases. Some people take a soft real-time
approach by calculating aggregates in-memory and saving it to tables periodically. One example
of this is twitter and storm. Other techniques includes using batch process to extract summaries
and storing them in a OLAP cube, for reporting purposes.
If your application doesn't need ad-hoc queries results immediately, usually mapreduce is
sufficient. Many people use Pig and Hive to do this type of operation.

On Fri, Sep 20, 2013 at 7:41 PM, Hartzman, Leslie <<>>
By ad-hoc queries I mean exactly what you've described. The need to access data from multiple
column families, typically addressed in RDBs with JOINs.

I haven't really become familiar enough with MapReduce yet, so I'll have to delve deeper into
that. I'm hoping that the de-normalized nature of things would obviate the need for complex
subquery-type of operations.

From: Peter Lin [<>]
Sent: Friday, September 20, 2013 4:30 PM

Subject: Re: Ad-hoc queries question

What do you mean by ad-hoc queries?
Most NoSql databases do not support cross table joins, due to the distributed nature of NoSql
databases. If we compare this to partitioned databases in the RDB world, cross partition joins
is also more expensive than non-partitioned databases.
you can do ad-hoc queries on a single table as long as the columns have secondary indexes
defined. You can do multi-table joins using MapReduce or using CQL handle that logic in your
application. In some cases, you can use the concept of summary tables to speed up complex
multi-table adhoc queries that have nasty joins. One thing that is very hard to do with all
NoSql databases is complex correlated subqueries. For those kinds of use cases, MapReduce
is the "preferred" technique.

for comparison, databases like Oracle RAC distribute table indexes and perform index joins
to speed up complex multi-table joins. The downside is a full Oracle RAC is very expensive
and has a high up front cost.

On Fri, Sep 20, 2013 at 7:20 PM, Hartzman, Leslie <<>>
Thanks Rob. I thought that might have been the situation but wasn't sure. So does this negate
the use of cqlsh to do this then? I'd hate to have to provide custom code to support ad-hoc


From: Robert Coli [<>]
Sent: Friday, September 20, 2013 4:06 PM
Subject: Re: Ad-hoc queries question

On Fri, Sep 20, 2013 at 3:25 PM, Hartzman, Leslie <<>>
So are ad-hoc queries more awkward or not feasible?


To expand slightly, you will probably end up querying multiple columnfamilies and doing the
ad-hoc JOIN-esque aspect in application code.


[CONFIDENTIALITY AND PRIVACY NOTICE] Information transmitted by this email is proprietary
to Medtronic and is intended for use only by the individual or entity to which it is addressed,
and may contain information that is private, privileged, confidential or exempt from disclosure
under applicable law. If you are not the intended recipient or it appears that this mail has
been forwarded to you without proper authority, you are notified that any use or dissemination
of this information in any manner is strictly prohibited. In such cases, please delete this
mail from your records. To view this notice in other languages you can either select the following
link or manually copy and paste the link into the address bar of a web browser:

View raw message