cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Holsman <>
Subject Re: Thoughts on a possible query language
Date Mon, 22 Jun 2009 22:13:19 GMT
any chance of using hypertable's or hbase's query language as a base?

both of these are column-oriented DB's which would have similar  
semantics to ours.

I want to avoid yet another query language which is specific to a tool  
from creeping up if possible.

saying that. I don't have the time to code it, so take it a wish, and  
I will be happy with anything that makes cassandra easier to use.

On 23/06/2009, at 4:42 AM, Sandeep Tata wrote:

> There is some (unfinished) code in the current repo on CQL a SQL-like
> Cassandra Query Language that is super simple and (AFAIK) limited to  
> single
> node queries.
> I suspect there are bigger questions to tackle before we get to query
> lanuages in the sense we're talking about--
> 1. Data model -- Cassandra's values are byte arrays. Any proposal  
> for a
> language needs to figure out precisely what data model you're  
> planning to
> support. (your examples include numbers, dates, strings)
> 2. Secondary indexes
> 3. Query runtime (queries that run on a single node, multiple nodes,  
> query
> optimizer?)
> I've never understood the value of a parallel-programming abstraction
> (map-reduce) for a single node database(CouchDB) ... and I certainly  
> don't
> think we're ready to build a map-reduce view engine *in* Cassandra  
> right
> now.
> IMHO,  there are a bunch of interesting issues we will need to solve  
> before
> we can seriously talk about a query language.
> On Mon, Jun 22, 2009 at 11:12 AM, Alexander Staubo <>  
> wrote:
>> Has anyone given thought to how an SQL-like query language could be
>> integrated into Cassandra?
>> I'm thinking of something which would let you evaluate a limited set
>> of relational select operators. For example:
>> * first_name = 'Bob'
>> * age > 32
>> * created_at between '2009-08' and '2009-09'
>> * employer_id in (34543, 13177, 9338)
>> First, is such functionality desired within the framework of
>> Cassandra, or do people prefer to keep this functionality in a
>> completely separate server component? There are pros and cons to keep
>> queries inside Cassandra. I could enumerate them, but I would like to
>> hear other people's thoughts first.
>> An alternative to a text-based query syntax would be to borrow
>> CouchDB's concept of views [1]. In CouchDB, views are pre-defined
>> indexes which are populated by filtering data through a pair of
>> map/reduce functions, which are usually written in JavaScript. Views
>> are somewhat limited in expressiveness and flexibility, and do not
>> address all possible use cases, but they are very efficient to  
>> compute
>> and store, and are a fairly elegant system.
>> Some challenges come to mind:
>> Cassandra's distributed nature means that a node's queryable indexes
>> can/should only reference data in that same node's partition, and  
>> that
>> a query might have to be executed on multiple nodes. For performance,
>> the query processing needs to be parallelized and pipelined.
>> Could a query planner/optimizer be able to reduce the number of nodes
>> required to satisfy a query by looking at the distribution of node
>> values across nodes? For example, if the column "first_name" value
>> "Foo" only occurs on node A, there's no need to involve node B. But
>> such knowledge requires the maintenance of statistics on each node
>> that cover all known peers, and the statistics must be kept up to  
>> date
>> to avoid glaring consistency issues.
>> Given the nature of Cassandra's column families it's not immediately
>> obvious to me how to best address columns in such a language.
>> [1]
>> A.

Ian Holsman

View raw message