Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3785B18ACC for ; Thu, 16 Jul 2015 17:59:05 +0000 (UTC) Received: (qmail 86577 invoked by uid 500); 16 Jul 2015 17:59:05 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 86543 invoked by uid 500); 16 Jul 2015 17:59:05 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 86530 invoked by uid 99); 16 Jul 2015 17:59:05 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jul 2015 17:59:05 +0000 Date: Thu, 16 Jul 2015 17:59:04 +0000 (UTC) From: "Robert Stupp (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-9767) Allow the selection of columns together with aggregates MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-9767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630091#comment-14630091 ] Robert Stupp commented on CASSANDRA-9767: ----------------------------------------- One nit: please remove the 3.0 chapter from NEWS.txt for 2.2 (you can do that on commit) Can you cleanup the unused imports in WritetimeOrTTLSelector and AbstractFunctionSelector on commit? One questionable change: you've introduced new {{isSet}} fields in {{SimpleSelector}} and {{WritetimeOrTTLSelector}}. I think the {{if (!isSet)}} test can safely changed to {{if (current==null)}}. As far as I understood we are returning a value from a "random" row anyway - so it wouldn't change that "contract" - but we'd safe one field. If you think this is ok, free to change it on commit. Otherwise LGTM > Allow the selection of columns together with aggregates > ------------------------------------------------------- > > Key: CASSANDRA-9767 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9767 > Project: Cassandra > Issue Type: Wish > Components: Core > Environment: Cassandra 2.0.16 > Ubuntu 15.04 > Reporter: Ajay > Assignee: Benjamin Lerer > Priority: Minor > > Lets assume we have a column family as below: > create table sample ( track_id int, user_id int, country varchar, primary key ((track_id), user_id)); > where track_id is the partition key. > Now to aggregate the number of rows for a single track_id, we can query using CQL as below: > select count(*) where track_id = 1 and user_id = 1; > But that will return only the count. If we need the other columns along with the count, we cannot query as below as it throws error: > select count(*), country from sample where track_id = 1 and user_id = 1; > Bad Request: line 1:15 mismatched input ',' expecting K_FROM. > In this case, all rows for a given track_id and user_id will have the same value for country. So we should be able to query as above. Also in SQL, it is possible to select columns along with aggregate functions. > Though I know that Cassandra is not analytics (unlike Hadoop and Spark), we need some basic aggregate functions like min, max, avg etc....Though performance wise it might not be efficient, but it is better done in the cassandra side (as it uses native protocol) than we getting all rows in the client and doing the basic aggregation. It cannot used just as a data store (as garbage-in garbage-out). In that context, currently CQL is pretty limited. Just for getting data out of cassandra, we will have to spark though we will not be doing much analytics on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)