Return-Path: Delivered-To: apmail-incubator-cassandra-dev-archive@minotaur.apache.org Received: (qmail 35929 invoked from network); 22 Jun 2009 22:13:46 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Jun 2009 22:13:46 -0000 Received: (qmail 32991 invoked by uid 500); 22 Jun 2009 22:13:57 -0000 Delivered-To: apmail-incubator-cassandra-dev-archive@incubator.apache.org Received: (qmail 32966 invoked by uid 500); 22 Jun 2009 22:13:57 -0000 Mailing-List: contact cassandra-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: cassandra-dev@incubator.apache.org Delivered-To: mailing list cassandra-dev@incubator.apache.org Received: (qmail 32953 invoked by uid 99); 22 Jun 2009 22:13:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2009 22:13:57 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.221.179] (HELO mail-qy0-f179.google.com) (209.85.221.179) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2009 22:13:46 +0000 Received: by qyk9 with SMTP id 9so228834qyk.32 for ; Mon, 22 Jun 2009 15:13:24 -0700 (PDT) Received: by 10.224.19.130 with SMTP id a2mr3193710qab.316.1245708804824; Mon, 22 Jun 2009 15:13:24 -0700 (PDT) Received: from ?10.172.32.59? (h-64-236-138-3.aoltw.net [64.236.138.3]) by mx.google.com with ESMTPS id 6sm2925841qwk.50.2009.06.22.15.13.22 (version=SSLv3 cipher=RC4-MD5); Mon, 22 Jun 2009 15:13:23 -0700 (PDT) Message-Id: <302CC611-1BD3-44D0-888B-1945CA4A7AE7@Holsman.net> From: Ian Holsman To: cassandra-dev@incubator.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v935.3) Subject: Re: Thoughts on a possible query language Date: Tue, 23 Jun 2009 08:13:19 +1000 References: <88daf38c0906221112r9a0316bg6f3611eb4e6c40da@mail.gmail.com> X-Mailer: Apple Mail (2.935.3) X-Virus-Checked: Checked by ClamAV on apache.org hey. any chance of using hypertable's or hbase's query language as a base? http://code.google.com/p/hypertable/wiki/HQLTutorial http://wiki.apache.org/hadoop/Hbase/HbaseShell. both of these are column-oriented DB's which would have similar semantics to ours. I want to avoid yet another query language which is specific to a tool from creeping up if possible. saying that. I don't have the time to code it, so take it a wish, and I will be happy with anything that makes cassandra easier to use. On 23/06/2009, at 4:42 AM, Sandeep Tata wrote: > There is some (unfinished) code in the current repo on CQL a SQL-like > Cassandra Query Language that is super simple and (AFAIK) limited to > single > node queries. > > I suspect there are bigger questions to tackle before we get to query > lanuages in the sense we're talking about-- > 1. Data model -- Cassandra's values are byte arrays. Any proposal > for a > language needs to figure out precisely what data model you're > planning to > support. (your examples include numbers, dates, strings) > 2. Secondary indexes > 3. Query runtime (queries that run on a single node, multiple nodes, > query > optimizer?) > > I've never understood the value of a parallel-programming abstraction > (map-reduce) for a single node database(CouchDB) ... and I certainly > don't > think we're ready to build a map-reduce view engine *in* Cassandra > right > now. > > IMHO, there are a bunch of interesting issues we will need to solve > before > we can seriously talk about a query language. > > > On Mon, Jun 22, 2009 at 11:12 AM, Alexander Staubo > wrote: > >> Has anyone given thought to how an SQL-like query language could be >> integrated into Cassandra? >> >> I'm thinking of something which would let you evaluate a limited set >> of relational select operators. For example: >> >> * first_name = 'Bob' >> * age > 32 >> * created_at between '2009-08' and '2009-09' >> * employer_id in (34543, 13177, 9338) >> >> First, is such functionality desired within the framework of >> Cassandra, or do people prefer to keep this functionality in a >> completely separate server component? There are pros and cons to keep >> queries inside Cassandra. I could enumerate them, but I would like to >> hear other people's thoughts first. >> >> An alternative to a text-based query syntax would be to borrow >> CouchDB's concept of views [1]. In CouchDB, views are pre-defined >> indexes which are populated by filtering data through a pair of >> map/reduce functions, which are usually written in JavaScript. Views >> are somewhat limited in expressiveness and flexibility, and do not >> address all possible use cases, but they are very efficient to >> compute >> and store, and are a fairly elegant system. >> >> Some challenges come to mind: >> >> Cassandra's distributed nature means that a node's queryable indexes >> can/should only reference data in that same node's partition, and >> that >> a query might have to be executed on multiple nodes. For performance, >> the query processing needs to be parallelized and pipelined. >> >> Could a query planner/optimizer be able to reduce the number of nodes >> required to satisfy a query by looking at the distribution of node >> values across nodes? For example, if the column "first_name" value >> "Foo" only occurs on node A, there's no need to involve node B. But >> such knowledge requires the maintenance of statistics on each node >> that cover all known peers, and the statistics must be kept up to >> date >> to avoid glaring consistency issues. >> >> Given the nature of Cassandra's column families it's not immediately >> obvious to me how to best address columns in such a language. >> >> [1] http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views >> >> A. >> -- Ian Holsman Ian@Holsman.net